Question on how to integrate Apache IoTDB into Calcite

2022-01-21 Thread Julian Feinauer
Hi all,

in the last weeks I worked on Integrating the Apache IoTDB Project with Calcite.
This covers two possible scenarios. One, to use Apache IoTDB as an Adapter in 
Apache Calcite (like MongoDB, Cassandra, et al) and on the other hand we are 
looking at using Calcites Query Optimizer to introduce indexing into the IoTDB 
server (the IoTDB Server builds a RelNode / Tree and passes it to the planner, 
after planning the resulting RelNode is then processed further by the IoTDB 
Server, executed and returned).

I looked a lot on the other Adapters and how they are implemented and have some 
questions:

One rather general question is about the Queryable<> Interface. I tried to look 
up all the docs (also in Linq) but still not fully understand it. From my 
understanding it is like a Enumerable<> but it has a “native” way to already to 
things like ordering or filtering. So if I have a Queryable<> which implements 
a custom Filter an automated “Push Down” can be done by the framework without a 
Rule or code generation.

One important requirement for us in IoTDB is to do the query pushdown to the 
TableScan (which is done implicitly in the current server but is first explicit 
in the RelNode that we generate).
So whats the best way to “merge” a LogicalFilter and a IoTDBTableScan to a 
“filtered” scan?
Is the right way to return a QueryableTable as TableScan and the Planner will 
take care by generating the call to ‘.filter(…)’.
The same applies to ordering.

Another question that is important for us is the usage of “Materialized Views” 
or other “Indexes”.
As we handle basically always timeseries in most cases the only suitable index 
is a “Materialized View” on parts of the time series which we can use to 
replace parts of the Relational Tree to avoid IO and computation for parts that 
are already precomputed.

Is there already an existing support for that in Calcite or would we just write 
custom Rules for our cases?

My last question is about the Callable TraitDef. So far I only used Enumerable 
Convention which results in  Code generation (which has an impact on the query 
latency). Am I right in assuming that the Binable Convention is somehow similar 
to the Enumerable Convention with the only difference that it does not do code 
generation but interpretation?
And to potentially use both (depending on whatever switch we set) we just have 
to provide Converter Rules for both?
What would you use in a Server setup? Always Enumerable?

Thanks already for any responses or hints!
Julian F


Re: Sort getting removed during optimization

2022-01-11 Thread Julian Feinauer
Hey Vladimir,

when this issue appeared it was

RelTraitSet desired = cluster.traitSet()
.replace(BindableConvention.INSTANCE);

RelNode expectedRoot = planner.changeTraits(root, desired);
planner.setRoot(expectedRoot);

And then


RelNode exp = planner.findBestExp();

So the root node had no sorting „requirement”.
But from my understanding of the SortRemoveRule it does remove the Sort and at 
the same time adds the respective CollationTrait to the input node.
In my case this was a LogicalProject.
I have no idea how the Project itself then assures that the CollationTrait is 
fulfilled.

By the way, is there a way to shot the Traits from a RelNode Tree? This could 
help to analyze this kind of situations?

Thanks!
Julian



From: Vladimir Ozerov 
Date: Tuesday, 11. January 2022 at 14:09
To: dev@calcite.apache.org (dev@calcite.apache.org) 
Subject: Re: Sort getting removed during optimization
Hi Julian,

When invoking the optimizer, you may provide the desired trait set of the
top-level node. It might happen, that the specific collation is not
requested from the optimizer, and hence the plan with a top-level Sort
operator is not chosen. Could you please show how you invoke the planner?

Regards,
Vladimir.

вт, 11 янв. 2022 г. в 12:44, Julian Feinauer :

> Hey Stamatis,
>
> yes, thats why I looked it up at first… the results are wrong : )
> So both tables for themselves are sorted but the Full Join is finally two
> blocks. The Left Join (sorted like the left rel) and then the remaining
> entries from the right side (also ordered). But overall not ordered.
>
> Best
> Julian
>
> From: Stamatis Zampetakis 
> Date: Tuesday, 11. January 2022 at 09:43
> To: dev@calcite.apache.org 
> Subject: Re: Sort getting removed during optimization
> Hi Julian F,
>
> Quite a naive question but did you get wrong results from the given plan? A
> missing sort is not necessarily problematic.
>
> I hope I am not saying something stupid but I think there are cases where a
> full join algorithm can retain the order of some of its inputs.
>
> Best,
> Stamatis
>
> On Tue, Jan 11, 2022 at 8:30 AM Julian Feinauer <
> j.feina...@pragmaticminds.de> wrote:
>
> > Hi Julian, Xiong,
> >
> > thanks for your fast replies!
> >
> > So first, the default Rules were registered:
> >
> > planner = new VolcanoPlanner();
> > RelOptUtil.registerDefaultRules(planner, false, true);
> >
> > And as traits I used:
> >
> > planner.addRelTraitDef(ConventionTraitDef.INSTANCE);
> > planner.addRelTraitDef(RelCollationTraitDef.INSTANCE);
> >
> > I digged a bit deeper and what was triggered was the `SortRemoveRule`.
> > If I disabled the Collation Trait this did no longer happen and all
> worked.
> >
> > I will later try to get a MWE done to reproduce this, if this is a bug.
> > Because the bug would then either be the Full Join producing a wrong
> > Collation or the SortRemoveRule investigating the input Collation wrong,
> or?
> >
> > But nonetheless, thank you very much!
> > Julian
> >
> > From: Julian Hyde 
> > Date: Tuesday, 11. January 2022 at 00:38
> > To: dev@calcite.apache.org 
> > Subject: Re: Sort getting removed during optimization
> > Is it possible that the Sort is being removed because some component
> knows
> > that the input is already sorted?
> >
> > In particular, if a relation has at most one row, it is always sorted.
> > Maybe the planner is deducing this via a some row-count metadata or
> > uniqueness constraint.
> >
> >
> > > On Jan 10, 2022, at 3:35 PM, xiong duan  wrote:
> > >
> > > If  I understand correctly, If we remove the  BINDABLE_SORT_RULE, the
> > > result will throw an exception about the  Plan transformation. So it
> > looks
> > > like a wrong rule's result. If you don't customize the rule, It is a
> bug,
> > > and please test this using Calcite's new version.
> >
>


Re: Sort getting removed during optimization

2022-01-11 Thread Julian Feinauer
Hey Stamatis,

yes, thats why I looked it up at first… the results are wrong : )
So both tables for themselves are sorted but the Full Join is finally two 
blocks. The Left Join (sorted like the left rel) and then the remaining entries 
from the right side (also ordered). But overall not ordered.

Best
Julian

From: Stamatis Zampetakis 
Date: Tuesday, 11. January 2022 at 09:43
To: dev@calcite.apache.org 
Subject: Re: Sort getting removed during optimization
Hi Julian F,

Quite a naive question but did you get wrong results from the given plan? A
missing sort is not necessarily problematic.

I hope I am not saying something stupid but I think there are cases where a
full join algorithm can retain the order of some of its inputs.

Best,
Stamatis

On Tue, Jan 11, 2022 at 8:30 AM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

> Hi Julian, Xiong,
>
> thanks for your fast replies!
>
> So first, the default Rules were registered:
>
> planner = new VolcanoPlanner();
> RelOptUtil.registerDefaultRules(planner, false, true);
>
> And as traits I used:
>
> planner.addRelTraitDef(ConventionTraitDef.INSTANCE);
> planner.addRelTraitDef(RelCollationTraitDef.INSTANCE);
>
> I digged a bit deeper and what was triggered was the `SortRemoveRule`.
> If I disabled the Collation Trait this did no longer happen and all worked.
>
> I will later try to get a MWE done to reproduce this, if this is a bug.
> Because the bug would then either be the Full Join producing a wrong
> Collation or the SortRemoveRule investigating the input Collation wrong, or?
>
> But nonetheless, thank you very much!
> Julian
>
> From: Julian Hyde 
> Date: Tuesday, 11. January 2022 at 00:38
> To: dev@calcite.apache.org 
> Subject: Re: Sort getting removed during optimization
> Is it possible that the Sort is being removed because some component knows
> that the input is already sorted?
>
> In particular, if a relation has at most one row, it is always sorted.
> Maybe the planner is deducing this via a some row-count metadata or
> uniqueness constraint.
>
>
> > On Jan 10, 2022, at 3:35 PM, xiong duan  wrote:
> >
> > If  I understand correctly, If we remove the  BINDABLE_SORT_RULE, the
> > result will throw an exception about the  Plan transformation. So it
> looks
> > like a wrong rule's result. If you don't customize the rule, It is a bug,
> > and please test this using Calcite's new version.
>


Re: Sort getting removed during optimization

2022-01-10 Thread Julian Feinauer
Hi Julian, Xiong,

thanks for your fast replies!

So first, the default Rules were registered:

planner = new VolcanoPlanner();
RelOptUtil.registerDefaultRules(planner, false, true);

And as traits I used:

planner.addRelTraitDef(ConventionTraitDef.INSTANCE);
planner.addRelTraitDef(RelCollationTraitDef.INSTANCE);

I digged a bit deeper and what was triggered was the `SortRemoveRule`.
If I disabled the Collation Trait this did no longer happen and all worked.

I will later try to get a MWE done to reproduce this, if this is a bug.
Because the bug would then either be the Full Join producing a wrong Collation 
or the SortRemoveRule investigating the input Collation wrong, or?

But nonetheless, thank you very much!
Julian

From: Julian Hyde 
Date: Tuesday, 11. January 2022 at 00:38
To: dev@calcite.apache.org 
Subject: Re: Sort getting removed during optimization
Is it possible that the Sort is being removed because some component knows that 
the input is already sorted?

In particular, if a relation has at most one row, it is always sorted. Maybe 
the planner is deducing this via a some row-count metadata or uniqueness 
constraint.


> On Jan 10, 2022, at 3:35 PM, xiong duan  wrote:
>
> If  I understand correctly, If we remove the  BINDABLE_SORT_RULE, the
> result will throw an exception about the  Plan transformation. So it looks
> like a wrong rule's result. If you don't customize the rule, It is a bug,
> and please test this using Calcite's new version.


Sort getting removed during optimization

2022-01-10 Thread Julian Feinauer
Hi all,

I just observed a Plan transformation that I don’t quite understand.
The Logical Plan is:

LogicalSort(sort0=[$0], dir0=[DESC])
  LogicalProject(time=[COALESCE($0, $2)], s1=[$1], s0=[$3])
LogicalJoin(condition=[=($0, $2)], joinType=[full])
  LogicalTableScan(table=[[root, root.test.d0.s1]])
  LogicalTableScan(table=[[root, root.vehicle.d0.s0]])

And the result of the Volcano Planner is

BindableProject(time=[COALESCE($0, $2)], s1=[$1], s0=[$3])
  BindableJoin(condition=[=($0, $2)], joinType=[full])
BindableTableScan(table=[[root, root.test.d0.s1]])
BindableTableScan(table=[[root, root.vehicle.d0.s0]])

I know wonder, why the LogicalSort is removed from the planner as the Output of 
the Join is NOT sorted (for a full join, for a left join it would be fine as 
the input table is sorted).

Is there anything I am missing or is this probably a Bug?

Thanks already!
Julian


Re: [ANNOUNCE] Danny Chan joins Calcite PMC

2019-11-03 Thread Julian Feinauer
Congratulations Danny! Very well deserved!

Julian

Am 01.11.19, 20:49 schrieb "Muhammad Gelbana" :

Congratulations!

Thanks,
Gelbana


On Fri, Nov 1, 2019 at 9:07 AM Stamatis Zampetakis 
wrote:

> Congratulations Danny!
>
> You are doing an amazing job. The project and the community is becoming
> better every day and your help is much appreciated.
>
> Keep up the momentum!
>
> Best,
> Stamatis
>
> On Thu, Oct 31, 2019 at 4:41 AM Kurt Young  wrote:
>
> > Congratulations Danny!
> >
> > Best,
> > Kurt
> >
> >
> > On Thu, Oct 31, 2019 at 11:18 AM Danny Chan 
> wrote:
> >
> > > Thank you so much colleagues, it’s my honor to work with you!
> > >
> > > I have always felt respected and the harmony of the community, hope to
> > > contribute more and I would give help as best as I can, thanks !
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年10月31日 +0800 AM5:22,Francis Chuang  >,写道:
> > > > I'm pleased to announce that Danny has accepted an invitation to
> > > > join the Calcite PMC. Danny has been a consistent and helpful
> > > > figure in the Calcite community for which we are very grateful. We
> > > > look forward to the continued contributions and support.
> > > >
> > > > Please join me in congratulating Danny!
> > > >
> > > > - Francis (on behalf of the Calcite PMC)
> > >
> >
>




Re: ApacheCon Europe 2019 talks which are relevant to Apache Calcite

2019-10-23 Thread Julian Feinauer
That would be really nice!
Just ping me I will be there all days!

Julian

From: Stamatis Zampetakis 
Sent: Wednesday, October 23, 2019 8:29:11 AM
To: dev@calcite.apache.org 
Subject: Re: ApacheCon Europe 2019 talks which are relevant to Apache Calcite

Most likely, I will be in Berlin on Thursday 24 for the conference!

Let's try to meet!

Stamatis

On Tue, Oct 8, 2019 at 10:36 AM Stamatis Zampetakis 
wrote:

> https://github.com/apache/calcite/pull/1489
>
> On Mon, Oct 7, 2019 at 9:48 PM Julian Hyde  wrote:
>
>> I feel remiss in filling out
>> https://calcite.apache.org/community/#upcoming-talks <
>> https://calcite.apache.org/community/#upcoming-talks>. I’d be grateful
>> if someone would remove ApacheCon NA and add ApacheCon Europe and log a PR.
>>
>> > On Oct 7, 2019, at 12:15 PM, Chris Baynes  wrote:
>> >
>> > Hi!
>> >
>> > I'll be giving a talk on "Fast federated SQL with Apache Calcite".
>> > Would be great to meet up with any other Calciters attending!
>> >
>> > See you there
>> >
>> > Chris
>> >
>> > On Mon, Oct 7, 2019 at 4:01 PM Julian Feinauer <
>> j.feina...@pragmaticminds.de>
>> > wrote:
>> >
>> >> Hi all,
>> >>
>> >> are there any Calcite related talks in Berlin or any Calciters
>> attending?
>> >> I will be there.
>> >>
>> >> JulianF
>> >>
>> >> Am 04.10.19, 19:09 schrieb "my...@apache.org" :
>> >>
>> >>Dear Apache Calcite committers,
>> >>
>> >>In a little over 2 weeks time, ApacheCon Europe is taking place in
>> >>Berlin. Join us from October 22 to 24 for an exciting program and
>> >> lovely
>> >>get-together of the Apache Community.
>> >>
>> >>We are also planning a hackathon.  If your project is interested in
>> >>participating, please enter yourselves here:
>> >>https://cwiki.apache.org/confluence/display/COMDEV/Hackathon
>> >>
>> >>The following talks should be especially relevant for you:
>> >>
>> >>  * *
>> >>
>> https://aceu19.apachecon.com/session/fast-federated-sql-apache-calcite*
>> >>
>> >>
>> >><
>> >>
>> https://aceu19.apachecon.com/session/patterns-and-anti-patterns-running-apache-bigdata-projects-kubernetes
>> >>>
>> >>
>> >>  *
>> >>
>> >>
>> >>
>> https://aceu19.apachecon.com/session/patterns-and-anti-patterns-running-apache-bigdata-projects-kubernetes
>> >><
>> >>
>> https://aceu19.apachecon.com/session/open-source-big-data-tools-accelerating-physics-research-cern
>> >>>
>> >>
>> >>  *
>> >>
>> >>
>> >>
>> https://aceu19.apachecon.com/session/open-source-big-data-tools-accelerating-physics-research-cern
>> >><
>> >>
>> https://aceu19.apachecon.com/session/ui-dev-big-data-world-using-open-source
>> >>>
>> >>
>> >>  *
>> >>
>> >>
>> >>
>> https://aceu19.apachecon.com/session/ui-dev-big-data-world-using-open-source
>> >>
>> >>Furthermore there will be a whole conference track on community
>> >> topics:
>> >>Learn how to motivate users to contribute patches, how the board of
>> >>directors works, how to navigate the Incubator and much more:
>> >> ApacheCon
>> >>Europe 2019 Community track <
>> >> https://aceu19.apachecon.com/sessions?track=42>
>> >>
>> >>Tickets are available here <
>> https://aceu19.apachecon.com/registration>
>> >> –
>> >>for Apache Committers we offer discounted tickets.  Prices will be
>> >> going
>> >>up on October 7th, so book soon.
>> >>
>> >>Please also help spread the word and make ApacheCon Europe 2019 a
>> >> success!
>> >>
>> >>We’re looking forward to welcoming you at #ACEU19!
>> >>
>> >>Best,
>> >>
>> >>Your ApacheCon team
>> >>
>> >>
>> >>
>>
>>


Re: [DISCUSS] Make Avatica more discoverable

2019-10-21 Thread Julian Feinauer
Hi,

I agree with Julians view. And I also agree with initially starting with 
Avatica as separate TLP but not as a separate PMC (there are multiple Examples 
of PMCs that govern multiple Projects, like pointed out from Julian).

JulianF

Am 21.10.19, 03:19 schrieb "Julian Hyde" :

In Apache, a project (i.e. a PMC) is defined by a community, not by a piece 
of code.

Is the Avatica community sufficiently separate from the Calcite community 
that it wants to govern itself? (In my opinion it is becoming more separate, 
but not there yet. I would like to hear what others think.)

If a member of Calcite's PMC is entirely disinterested in Avatica, then 
he/she can simply ignore votes on Avatica releases. And similarly, PMC members 
disinterested in Avatica can ignore votes on Calcite releases. There’s no harm 
unless the email volume becomes unmanageable.

A given PMC can release multiple pieces of software. For example, the 
Lucene PMC releases both Lucene and SOLR. https://lucene.apache.org/solr/ 
 

Julian




> On Oct 20, 2019, at 4:08 PM, Michael Mior  wrote:
> 
> If we want Avatica to be truly independent, then yes, it would have to
> be it's own TLP with a separate PMC. (Avatica could not otherwise make
> releases and follow other processes without going through the Calcite
> PMC.) I think there are several other TLPs with less activity than
> Avatica, so I don't think that's a major concern although we could
> take a roll call to see who on the current PMC would be willing join
> the Avatica PMC.
> --
> Michael Mior
> mm...@apache.org
> 
> Le dim. 20 oct. 2019 à 17:49, Francis Chuang
>  a écrit :
>> 
>> I think Michael and Julian's proposals are all good ideas.
>> 
>> Regarding moving Avatica to an independent project, what is the process
>> for doing so? Does this process turn it into a TLP with its own seperate
>> PMC and PMC Chair? My only concern is that work on Avatica is not as
>> active as Calcite and there aren't as many active contributors. At the
>> same time, it's also possible the project could gain a lot more new
>> contributors if it becomes more independent.
>> 
>> Francis
>> 
>> On 18/10/2019 7:09 am, Julian Hyde wrote:
>>> Many people who are interested in Avatica are not interested in 
Calcite. (Yes the Board are interested in both, because they are interested in 
the communities, which overlap. But users of Avatica, not so much. And if 
people perceive that Avatica requires Calcite they might be less likely to 
adopt it.)
>>> 
>>> I think Avatica would be more discoverable if it was hosted at 
https://avatica.apache.org , rather than 
https://calcite.apache.org/avatica/ .
>>> 
>>> I think the brief mention of Avatica on Calcite’s home page is 
adequate. Perhaps there should be a section on Avatica in 
https://calcite.apache.org/community/ .
>>> 
>>> In other words, it’s time to move Avatica a little further along its 
evolution from module to sub-project to independent project.
>>> 
>>> Julian
>>> 
>>> 
 On Oct 17, 2019, at 6:54 AM, Michael Mior  wrote:
 
 Since there's only one sub-project, why don't we convert the
 Sub-Projects section on the homepage into a short description of
 Avatica?
 --
 Michael Mior
 mm...@apache.org
 
 Le jeu. 17 oct. 2019 à 06:19, Francis Chuang
  a écrit :
> 
> This was one of the comments on the October Board report for Calcite:
> 
>   df: Great progress!
> 
>   About Avatica - I had to google to find the subproject as I did
>   not see anything obvious on the Calcite site. I would also be
>   good to provide more status on the subproject in your future
>   reports.
> 
> The Avatica sub-project is currently linked on Calcite's homepage, but
> it is not very noticeable or discoverable.
> 
> Any thoughts on how we can improve the visibility of Avatica?
> 
> Francis
>>> 
>>> 





Re: Apache Calcite meetup group

2019-10-21 Thread Julian Feinauer
Hi,

I would love to attend a Caclite meetup but as we are from Germany I'm unsure 
about the community here.
But what about a "virtual meetup" via Hangout or something as "addition".
But of course, if people from Germany are interested we would be honored to 
host a Meetup near Stuttgart.

JulianF

Am 18.10.19, 19:53 schrieb "Jesus Camacho Rodriguez" :

It seems someone else (Denis Magda) paid the fees in the meantime.

-Jesús

On Fri, Oct 18, 2019 at 1:32 AM Danny Chan  wrote:

> Thanks Jesús for taking over this !
>
> Best,
> Danny Chan
> 在 2019年10月18日 +0800 PM2:00,dev@calcite.apache.org,写道:
> >
> > Jesús
>




Re: [DISCUSS] Draft board report for October 2019

2019-10-07 Thread Julian Feinauer
Hi Francis,

no, it looks excellent, +1.

Julian

Am 06.10.19, 01:20 schrieb "Francis Chuang" :

If there are no objections, I plan to submit the report tomorrow (7 
October 2019) as the deadline is on the 9th.

On 3/10/2019 4:41 pm, Stamatis Zampetakis wrote:
> Looks great, nothing to add! Thanks Francis.
> 
> Best,
> Stamatis
> 
> On Thu, Oct 3, 2019 at 2:56 AM Michael Mior  wrote:
> 
>> Thanks Francis! Looks good to me.
>> --
>> Michael Mior
>> mm...@apache.org
>>
>> Le mer. 2 oct. 2019 à 19:21, Francis Chuang  a
>> écrit :
>>>
>>> Attached below is a draft of this month's board report. Please let me
>>> know if you have any additions or corrections. Note that the format of
>>> the report has changed slightly compared to the previous ones and the
>>> ASF has revamped the reporting tool.
>>>
>>> ## Description:
>>> Apache Calcite is a highly customizable framework for parsing and
>>> planning queries on data in a wide variety of formats. It allows
>>> database-like access, and in particular a SQL interface and advanced
>>> query optimization, for data not residing in a traditional database.
>>>
>>> ## Issues:
>>> There are no issues requiring board attention.
>>>
>>> ## Membership Data:
>>> Apache Calcite was founded 2015-10-22 (4 years ago)
>>> There are currently 45 committers and 20 PMC members in this project.
>>> The Committer-to-PMC ratio is 9:4.
>>>
>>> Community changes, past quarter:
>>> - No new PMC members. Last addition was Stamatis Zampetakis on
>> 2019-04-13.
>>> - Julian Feinauer was added as committer on 2019-09-10
>>> - Mohamed Mohsen was added as committer on 2019-09-18
>>>
>>> ## Project Activity:
>>> Development and mailing list activity is steady for both Calcite and its
>>> Avatica sub-project.
>>>
>>> Calcite 1.21.0 was released in the middle of September, including more
>>> than 100 resolved issues and maintaining a release cadence of roughly
>>> one release per quarter.
>>>
>>> We are also seeing new faces on the mailing list and opening pull
>>> requests on Github. In terms of pull requests, our committers and
>>> contributors have made a lot of progress to provide feedback to open
>>> pull requests and filed issues in a timely manner. This is evidenced by
>>> the open pull requests on Github receiving comments within a couple of
>>> days after being opened.
>>>
>>> Members of the project also participated in ApacheCon NA last month,
>>> presenting 5 talks about Calcite.
>>>
>>> Finally, the Apache Ignite project has decided to adopt Calcite as its
>>> SQL execution engine, replacing H2. This is an exciting development and
>>> is a testament to the sound foundation and community the Calcite project
>>> has developed.
>>>
>>> ## Community Health:
>>> Activity levels on mailing lists, git and JIRA are normal for both
>>> Calcite and Avatica with a slight decrease in code contributors and
>>> commits (7% and 1% respectively).
>>>
>>> The rates of pull requests being closed and merged on Github has
>>> increased by 16%, as we work to clear our backlog and we are also seeing
>>> a 7% increase in opened pull requests.
>>>
>>> Since the last report, we have added 2 new committers, Julian Feinauer
>>> and Mohamed Mohsen.
>>>
>>> We expect further growth in these numbers as Apache Ignite works to
>>> integrate Calcite into their project, resulting in cross-pollination
>>> between the 2 projects.
>>
> 




Re: ApacheCon Europe 2019 talks which are relevant to Apache Calcite

2019-10-07 Thread Julian Feinauer
Hi all,

are there any Calcite related talks in Berlin or any Calciters attending?
I will be there.

JulianF

Am 04.10.19, 19:09 schrieb "my...@apache.org" :

Dear Apache Calcite committers,

In a little over 2 weeks time, ApacheCon Europe is taking place in 
Berlin. Join us from October 22 to 24 for an exciting program and lovely 
get-together of the Apache Community.

We are also planning a hackathon.  If your project is interested in 
participating, please enter yourselves here: 
https://cwiki.apache.org/confluence/display/COMDEV/Hackathon

The following talks should be especially relevant for you:

  * *https://aceu19.apachecon.com/session/fast-federated-sql-apache-calcite*





  *


https://aceu19.apachecon.com/session/patterns-and-anti-patterns-running-apache-bigdata-projects-kubernetes



  *


https://aceu19.apachecon.com/session/open-source-big-data-tools-accelerating-physics-research-cern



  *


https://aceu19.apachecon.com/session/ui-dev-big-data-world-using-open-source

Furthermore there will be a whole conference track on community topics: 
Learn how to motivate users to contribute patches, how the board of 
directors works, how to navigate the Incubator and much more: ApacheCon 
Europe 2019 Community track 

Tickets are available here  – 
for Apache Committers we offer discounted tickets.  Prices will be going 
up on October 7th, so book soon.

Please also help spread the word and make ApacheCon Europe 2019 a success!

We’re looking forward to welcoming you at #ACEU19!

Best,

Your ApacheCon team





Re: Ignite community is building Calcite-based prototype

2019-10-01 Thread Julian Feinauer
Hi Igor,

I agree that it should be rather similar to what Drill did as distributed 
computing also is a big concern for Ignite, I guess, right?

Julian

Am 01.10.19, 15:06 schrieb "Seliverstov Igor" :

Guys,

The better link: 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
 
<https://cwiki.apache.org/confluence/display/IGNITE/IEP-37:+New+query+execution+engine>

Almost everything you may see by the link is the same as Drill guys already 
did, the difference is in details but the idea is the same.

Of course we’ll face many issues while development and I'll appreciate if 
some of you assist us.

Regards,
Igor

> 1 окт. 2019 г., в 12:32, Julian Feinauer  
написал(а):
> 
> Hi Denis,
> 
> Nice to hear from you and the ignite team... that sounds like an 
excellent idea. I liked the idea of Ignite since I heard about it (I think when 
it became TLP back then). So I would be happy to help you if you have specific 
questions... I‘m currently working on a related topic, namely integrate calcite 
as SQL Layer into Apache IoTDB .
> 
> Best
> Julian
> 
> Holen Sie sich Outlook für iOS<https://aka.ms/o0ukef>
> 
> Von: Denis Magda 
> Gesendet: Tuesday, October 1, 2019 2:37:20 AM
> An: dev@calcite.apache.org ; dev 

> Betreff: Ignite community is building Calcite-based prototype
> 
> Hey ASF-mates,
> 
> Just wanted to send a note for Ignite dev community who has started
> prototyping
> 
<http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-td43724.html>
> with a new Ignite SQL engine and Calcite was selected as the most 
favorable
> option.
> 
> We will truly appreciate if you help us with questions that might hit your
> dev list. Ignite folks have already studied Calcite well enough and 
carried
> on with the integration, but there might be tricky parts that would 
require
> your expertise.
> 
> Btw, if anybody is interested in Ignite (memory-centric database and
> compute platform) or would like to learn more details about the prototype
> or join its development, please check these links or send us a note:
> 
>   - https://ignite.apache.org
>   -
>   
https://cwiki.apache.org/confluence/display/IGNITE/IEP-33%3A+New+SQL+executor+engine+infrastructure
> 
> 
> -
> Denis,
> Ignite PMC Chair





Re: Ignite community is building Calcite-based prototype

2019-10-01 Thread Julian Feinauer
Hi Denis,

Nice to hear from you and the ignite team... that sounds like an excellent 
idea. I liked the idea of Ignite since I heard about it (I think when it became 
TLP back then). So I would be happy to help you if you have specific 
questions... I‘m currently working on a related topic, namely integrate calcite 
as SQL Layer into Apache IoTDB .

Best
Julian

Holen Sie sich Outlook für iOS

Von: Denis Magda 
Gesendet: Tuesday, October 1, 2019 2:37:20 AM
An: dev@calcite.apache.org ; dev 

Betreff: Ignite community is building Calcite-based prototype

Hey ASF-mates,

Just wanted to send a note for Ignite dev community who has started
prototyping

with a new Ignite SQL engine and Calcite was selected as the most favorable
option.

We will truly appreciate if you help us with questions that might hit your
dev list. Ignite folks have already studied Calcite well enough and carried
on with the integration, but there might be tricky parts that would require
your expertise.

Btw, if anybody is interested in Ignite (memory-centric database and
compute platform) or would like to learn more details about the prototype
or join its development, please check these links or send us a note:

   - https://ignite.apache.org
   -
   
https://cwiki.apache.org/confluence/display/IGNITE/IEP-33%3A+New+SQL+executor+engine+infrastructure


-
Denis,
Ignite PMC Chair


Re: [DISCUSS] Small contributions

2019-09-27 Thread Julian Feinauer
Yes, I totally agree that's a major change by any means. As Julian pointed out 
above its only about non-code changes.

Julian

From: Andrei Sereda 
Sent: Friday, September 27, 2019 7:25:56 PM
To: dev@calcite.apache.org 
Subject: Re: [DISCUSS] Small contributions

I presume 3rd party library upgrades should go through regular process
(jira/PR etc.) ?

Dependency upgrade is not considered  "small change" since impact is
greater than just a "typo fix".


On Thu, Sep 26, 2019 at 1:47 PM Julian Hyde  wrote:

> A few points.
>
> I don’t like the term “hot fix”. A hot fix has an existing meaning[1] - it
> is a patch you apply to your binaries. Let’s not use that term.
>
> Let’s define “small contributions” as contributions that do not modify
> code and therefore will not break anything, do not need a test or
> documentation change, and do not need a CI run.
>
> I am in favor of accepting small contributions. I wasn’t previously.
>
> We can have guidelines about how to label these small contributions (e.g.
> git labels, certain words in the commit message or PR description). But we
> shouldn’t expect or require contributors to follow those guidelines. By
> their nature, these contributors have not had time to read all of our
> policy documents.
>
> Reviewers must know what our policy is, and should massage commit messages
> tot conform to policy.
>
> These kinds of changes are, by definition, very small and simple. A
> committer can review, approve, fix up, and push to master, and close the PR
> in one go. Five minutes. If the PR requires a back-and-forth then it is not
> a “simple” change.
>
> We should not require a JIRA case.
>
> We not apply the usual policy of appending the contributor’s name to the
> commit message. A typical commit message would be “Fix a comment”.
>
> Release manager should remove these kinds of trivial changes from the
> release notes. They add nothing to the release notes.
>
> These kinds of changes do earn “merit” - the basis on which we make people
> committers - but they earn less merit than a bug fix, a new feature, a
> detailed response to a question on the dev list, or a conference talk. I
> don’t want people to believe that they can earn committership by fixing 100
> typos.
>
> There can be problems if a community over-relies on small PRs. In
> particular, there is a project in the Incubator that has only one or two
> regular developers but receives hundreds of contributions a few lines long
> via PRs. The discussion occurs in the PRs, and contributors rarely make
> more than 1 or 2 contributions. The problem for the project is that there
> is no emergent “community”. This is a serious problem for that project, and
> obviously we do not have that problem. Still, there is a side effect to the
> back-and-forth discussion to get a change accepted, namely that the
> individuals get to know each other. We don’t want to lose that.
>
>
> Julian
>
> [1] https://en.wikipedia.org/wiki/Hotfix <
> https://en.wikipedia.org/wiki/Hotfix>
>
>
>
>
> > On Sep 26, 2019, at 5:17 AM, Michael Mior  wrote:
> >
> > I thought about a label, but I think it's probably more productive to
> > just review the change immediately if it really is something trivial.
> > The problem is that labels can only be applied by committers. That's
> > why I suggested asking those who submit PRs to include something in
> > the PR title. If others think a label would help though, I'm not
> > opposed to it.
> > --
> > Michael Mior
> > mm...@apache.org
> >
> > Le jeu. 26 sept. 2019 à 07:28, TANG Wen-hui
> >  a écrit :
> >>
> >> I agree that we should accept these small changes but not create JIRA
> for them.
> >> In my opinion, maybe we can label the PR of these small changes.  And
> process them at regular intervals in case of forgetting.
> >>
> >> best,
> >> --
> >> wenhui
> >>
> >>
> >>
> >> winifred.wenhui.t...@gmail.com
> >>
> >> From: Haisheng Yuan
> >> Date: 2019-09-26 10:17
> >> To: Francis Chuang; dev@calcite.apache.org (dev@calcite.apache.org)
> >> Subject: Re: Re: [DISCUSS] Small contributions
> >>> most of the time, the author of the fix would  have moved on and have
> >> forgotten about it, resulting in the improvement falling through the
> cracks.
> >>
> >> Make sense. I think our current position worth reconsidering and I
> >> agree with Francis.
> >>
> >> - Haisheng
> >>
> >> --
> >> 发件人:Francis Chuang
> >> 日 期:2019年09月26日 09:20:49
> >> 收件人:
> >> 主 题:Re: [DISCUSS] Small contributions
> >>
> >> From personal experience, I think we should accept these small changes.
> >> I have had lots of  cases where I am reading code or documentation on
> >> Github and found small errors or typos that are easy to fix, so I'd edit
> >> directly in Github and open a PR. These changes do improve the codebase
> >> and fix errors that could be misleading or confuse future maintainers
> >> and users.
> >>
> >> It might be easy to say 

Re: Integrating Calcite, Omid, Avatica, Avro, and Kafka

2019-09-24 Thread Julian Feinauer
Hi,

I agree with that and would love such a section (with timestamps). This could 
also make it easier for newcomers to learn about calcite.

Julian

From: Julian Hyde 
Sent: Tuesday, September 24, 2019 8:50:21 PM
To: dev 
Subject: Re: Integrating Calcite, Omid, Avatica, Avro, and Kafka

I like the idea of time-ordered articles. It creates less expectation the 
articles are “definitive”.

How about using the “news” section of our site? [1] We use it mainly for 
releases, but it’s good for other news too (see the “Other news” section). 
Jekyll mades it pretty easy to add a news item.

Julian

[1] https://calcite.apache.org/news/ 

> On Sep 24, 2019, at 6:11 AM, Michael Mior  wrote:
>
> I agree that it will get out of date. But I think we would just put
> the date on each article we post and leave it to the reader to decide
> if what is posted is still relevant. We could perhaps include a
> disclaimer. I think it's useful for newcomers to see the activity
> happening around Calcite.
> --
> Michael Mior
> mm...@apache.org
>
> Le lun. 23 sept. 2019 à 18:06, Julian Hyde  a écrit :
>>
>> A section of the web site might take quite a bit of curation, because these 
>> links go out of date. Also, we might find ourselves asked to endorse 
>> companies and products, which we shouldn’t be doing.
>>
>> I think a reasonable compromise is to use the @ApacheCalcite twitter account 
>> to forward interesting content. The good stuff often gets picked up by 
>> aggregators such as Data Eng Weekly[1], and by tweeting we can help the 
>> editors of those aggregators find the good content.
>>
>> Julian
>>
>> [1] https://dataengweekly.com/ 
>>
>>
>>
>>> On Sep 23, 2019, at 7:39 AM, Michael Mior  wrote:
>>>
>>> Thanks for sharing!
>>>
>>> On a related note, I wonder what others think about having a section
>>> of the website where we include links to such b blog posts and
>>> articles referencing Calcite.
>>> --
>>> Michael Mior
>>> mm...@apache.org
>>>
>>> Le lun. 23 sept. 2019 à 09:50, Robert Yokota  a écrit :

 Hi,

 In case anyone is interested, I wrote a post about integrating Calcite,
 Omid, Avatica, Avro, and Kafka here:
 https://yokota.blog/2019/09/23/building-a-relational-database-using-kafka/

 Regards,
 Robert
>>



Re: [ANNOUNCE] New committer: Muhammad Gelbana

2019-09-19 Thread Julian Feinauer
Congrats Muhammad!

Julian

Am 19.09.19, 09:47 schrieb "Kevin Risden" :

Congrats and welcome Muhammad!

Kevin Risden


On Wed, Sep 18, 2019 at 12:15 PM Muhammad Gelbana 
wrote:

> Thank you all for the very warm welcome.
>
> Calcite really changed a lot of things for my current employer. I got to
> know Calcite when I was exposed to Drill and now we're planning to
> integrating Calcite within our product. Being an analytics company, 
Calcite
> provides great deal of help to us.
>
> Thanks,
> Gelbana
>
>
> On Wed, Sep 18, 2019 at 5:10 PM Andrei Sereda  wrote:
>
> > Congrats, Muhammad!
> >
> > On Tue, Sep 17, 2019 at 11:26 PM Amit Chavan  wrote:
> >
> > > Congratulations, Muhammad !!
> > >
> > > On Tue, Sep 17, 2019 at 8:10 PM XING JIN 
> > wrote:
> > >
> > > > Congrats, Muhammad !
> > > >
> > > > 王炎林 <1989yanlinw...@163.com> 于2019年9月18日周三 上午10:38写道:
> > > >
> > > > > Congratulations, Muhammad!
> > > > >
> > > > >
> > > > > Best,
> > > > > Yanlin
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > At 2019-09-18 05:58:53, "Francis Chuang"  >
> > > > wrote:
> > > > > >Apache Calcite's Project Management Committee (PMC) has invited
> > > Muhammad
> > > > > >Gelbana to become a committer, and we are pleased to announce 
that
> > he
> > > > > >has accepted.
> > > > > >
> > > > > >Muhammad is an active contributor and has contributed numerous
> > patches
> > > > > >to Calcite. He has also been extremely active on the mailing 
list,
> > > > > >helping out new users and participating in design discussions.
> > > > > >
> > > > > >Muhammad, welcome, thank you for your contributions, and we look
> > > forward
> > > > > >your further interactions with the community! If you wish, please
> > feel
> > > > > >free to tell us more about yourself and what you are working on.
> > > > > >
> > > > > >Francis (on behalf of the Apache Calcite PMC)
> > > > >
> > > >
> > >
> >
>




Re: [ANNOUNCE] New committer: Julian Feinauer

2019-09-19 Thread Julian Feinauer
Thanks all of you so much!

Am 19.09.19, 09:47 schrieb "Kevin Risden" :

Congrats and welcome Julian!

Kevin Risden


On Wed, Sep 18, 2019 at 4:12 PM Muhammad Gelbana 
wrote:

> Welcome aboard !
>
> Thanks,
> Gelbana
>
>
> On Wed, Sep 18, 2019 at 5:10 PM Andrei Sereda  wrote:
>
> > Congratulations, Julian !
> >
> > On Tue, Sep 17, 2019 at 11:26 PM Amit Chavan  wrote:
> >
> > > Congrats, Julian !!
> > >
> > > On Tue, Sep 17, 2019 at 8:12 PM XING JIN 
> > wrote:
> > >
> > > > Congrats, Julian !
> > > > You are well deserved ~
> > > >
> > > > Haisheng Yuan  于2019年9月18日周三 上午10:38写道:
> > > >
> > > > > Congrats, Julian!
> > > > >
> > > > > - Haisheng
> > > > >
    > > > > > --
> > > > > 发件人:Chunwei Lei
> > > > > 日 期:2019年09月18日 10:30:31
> > > > > 收件人:
> > > > > 主 题:Re: [ANNOUNCE] New committer: Julian Feinauer
> > > > >
> > > > > Congratulations, Julian!
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > > Chunwei
> > > > >
> > > > >
> > > > > On Wed, Sep 18, 2019 at 9:24 AM Danny Chan 
    > > > wrote:
    > > > > >
> > > > > > Congratulations, Muhammad ! Welcome to join us ! Thanks for your
> > huge
> > > > > > contribution for the Match Recognize.
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > > > 在 2019年9月18日 +0800 AM5:55,Francis Chuang <
> francischu...@apache.org
> > > > >,写道:
> > > > > > > Apache Calcite's Project Management Committee (PMC) has 
invited
> > > > Julian
> > > > > > > Feinauer to become a committer, and we are pleased to announce
> > that
> > > > he
> > > > > > > has accepted.
> > > > > > >
> > > > > > > Julian is an active contributor to the Calcite code base and
> has
> > > been
> > > > > > > active on the mailing list answering questions, participating
> in
> > > > > > > discussions and voting for releases.
> > > > > > >
> > > > > > > Julian, welcome, thank you for your contributions, and we look
> > > > forward
> > > > > > > your further interactions with the community! If you wish,
> please
> > > > feel
> > > > > > > free to tell us more about yourself and what you are working
> on.
> > > > > > >
> > > > > > > Francis (on behalf of the Apache Calcite PMC)
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>




Re: [ANNOUNCE] New committer: Julian Feinauer

2019-09-17 Thread Julian Feinauer
Hi all,

thanks all of you for the warm welcome and the honor to be a committer on this 
awesome project!
To present myself a bit... I'm currently mostly working on the MATCH_RECOGNIZE 
stuff and am planning to do a Calcite integration soon for the IoTDB podling.
Overall I'm highly interested in "traces" or "timeseries" and how we can map 
them well to the relational world.

Best
Julian

Am 17.09.19, 22:26 schrieb "Amit Chavan" :

Congrats, Julian !!

On Tue, Sep 17, 2019 at 8:12 PM XING JIN  wrote:

> Congrats, Julian !
> You are well deserved ~
>
> Haisheng Yuan  于2019年9月18日周三 上午10:38写道:
>
> > Congrats, Julian!
> >
> > - Haisheng
> >
> > --
> > 发件人:Chunwei Lei
> > 日 期:2019年09月18日 10:30:31
> > 收件人:
> > 主 题:Re: [ANNOUNCE] New committer: Julian Feinauer
> >
> > Congratulations, Julian!
> >
> >
> >
> > Best,
> > Chunwei
> >
> >
> > On Wed, Sep 18, 2019 at 9:24 AM Danny Chan  wrote:
> >
> > > Congratulations, Muhammad ! Welcome to join us ! Thanks for your huge
> > > contribution for the Match Recognize.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年9月18日 +0800 AM5:55,Francis Chuang  >,写道:
> > > > Apache Calcite's Project Management Committee (PMC) has invited
> Julian
> > > > Feinauer to become a committer, and we are pleased to announce that
> he
> > > > has accepted.
> > > >
> > > > Julian is an active contributor to the Calcite code base and has 
been
> > > > active on the mailing list answering questions, participating in
> > > > discussions and voting for releases.
> > > >
> > > > Julian, welcome, thank you for your contributions, and we look
> forward
> > > > your further interactions with the community! If you wish, please
> feel
> > > > free to tell us more about yourself and what you are working on.
> > > >
> > > > Francis (on behalf of the Apache Calcite PMC)
> > >
> >
> >
>




Re: Query Compilation happening more often then expected

2019-09-17 Thread Julian Feinauer
Hi,

this is a good point Julian. So an implementation should consider a re-planning 
(possibly triggered by a quick recheck of cost function with the given Literal 
values). But this should not be a general issue with the approach, or?

JulianF

Am 16.09.19, 23:36 schrieb "Julian Hyde" :

I found evidence that MSSQL[1] and Sybase ASE[2] do it.

I agree, it's not a free lunch. For instance, if a column has a
non-uniform distribution, some values might be much more selective
than others, and it would be much better to know which value you are
dealing with at planning time, rather than execution time.

Julian

[1] 
https://docs.microsoft.com/en-us/sql/relational-databases/performance/specify-query-parameterization-behavior-by-using-plan-guides?view=sql-server-2017

[2] 
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00743.1570/html/queryprocessing/BIIIBEJJ.htm

On Mon, Sep 16, 2019 at 3:36 PM Stamatis Zampetakis  
wrote:
>
> Out of curiosity does anybody know if popular DBMS (Postgres, Oracle, SQL
> Server, etc.) support "hoisting"?
>
> Performing it all the time does not seem a very good idea (constant
> reduction, histograms, and other optimization techniques would be
> impossible)
> while leaving its configuration to the end-user may not be a
> straightforward decision.
>
> On Sat, Sep 14, 2019 at 4:29 PM Julian Hyde  
wrote:
>
> > The idea of converting literals into bind variables is called 
“hoisting”.
> > We had the idea a while ago but have not implemented it.
> >
> > https://issues.apache.org/jira/browse/CALCITE-963
> >
> > Until that feature is implemented, you will need to create bind 
variables
> > explicitly, and bind them before executing the query.
> >
> > Julian
> >
> > > On Sep 13, 2019, at 4:39 PM, Scott Reynolds 
> > wrote:
> > >
> > > Hi,
> > >
> > > Spent a bunch of time researching and staring at code today to 
understand
> > > the code compilation path within Calcite. I started down this path
> > because
> > > we noticed whenever we changed the `startDate` or `endDate` for the 
query
> > > it went through compilation process again. We expected it to use the
> > > previous classes `bind` it with the new RexLiterals. I was *hoping*  
the
> > > RexLiterals were passed into the `bind()` method but that does not 
appear
> > > to be the main goal of `DataContext` objects.
> > >
> > > We also found the trick Kylin did to improve their query compilation 
with
> > > prepared statements:
> > > https://issues.apache.org/jira/browse/KYLIN-3434 but PreparedStatement
> > is
> > > stateful and I don't believe a good way to solve this issue.
> > >
> > > I would like to propose a change to Calcite so that Filters are passed
> > into
> > > the `bind()` call alongside or within DataContext. This would allow 
the
> > > `EnumerableRel` implementations to reference the `Filters` as 
arguments.
> > > This -- I believe -- would cause any change to the filters to use
> > > the previously compiled class instead of generating a brand new one.
> > >
> > > I am emailing everyone on this list for two reasons:
> > > 1. Is this a bad idea ?
> > > 2. I don't have a design yet so would love any ideas. Should we stick
> > more
> > > stuff into `DataContext`? Should `EnumerableRel` have another method 
that
> > > is used to gather these RexLiterals?
> >




[jira] [Created] (CALCITE-3345) Implement time_bucket function

2019-09-14 Thread Julian Feinauer (Jira)
Julian Feinauer created CALCITE-3345:


 Summary: Implement time_bucket function
 Key: CALCITE-3345
 URL: https://issues.apache.org/jira/browse/CALCITE-3345
 Project: Calcite
  Issue Type: New Feature
Reporter: Julian Feinauer


See here for information on the `time_bucket` function: 
https://docs.timescale.com/latest/api#time_bucket

This is a more powerful version of the standard PostgreSQL date_trunc function. 
It allows for arbitrary time intervals instead of the second, minute, hour, 
etc. provided by date_trunc. The return value is the bucket's start time.

This would especially help with time averaging but keeps everything SQL 
compliant. E.g. queries like 

Example query from (https://www.timescale.com/):

{code:sql}
SELECT  time_bucket('10  seconds',  time)  AS  ten_second,
machine_id,  avg(temperature)  AS  "avgT",
min(temperature)  AS  "minT",  max(temperature)  AS  "maxT",
last(temperature,  time)  AS  "lastT"
FROM  measurements
WHERE  machine_id  =  'C931baF7'
AND  time  >  now()  -  interval  '150s'
GROUP  BY  ten_second
ORDER  BY  ten_second  DESC;
{code}




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (CALCITE-3341) Implement FINAL modifier functionality

2019-09-12 Thread Julian Feinauer (Jira)
Julian Feinauer created CALCITE-3341:


 Summary: Implement FINAL modifier functionality
 Key: CALCITE-3341
 URL: https://issues.apache.org/jira/browse/CALCITE-3341
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Feinauer


And set default modifier to RUNNING that its compliant with Oracle 
implementation, see CALCITE-3302.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: [RESULT] [VOTE] Release apache-calcite-1.21.0 (release candidate 1)

2019-09-10 Thread Julian Feinauer
Thank you Francis, good job!

JulianF

Am 10.09.19, 14:34 schrieb "Francis Chuang" :

Thanks for getting this massive release out, Stamatis!

Francis

On 11/09/2019 7:27 am, Stamatis Zampetakis wrote:
> Thanks to everyone who has tested the release candidate and given
> their comments and votes.
> 
> The tally is as follows.
> 
> 4 binding +1s:
> Julian H., Francis, Vladimir, Stamatis
> 
> 6 non-binding +1s:
> Anton, Julian F., Haisheng, Danny, Chunwei, Andrei
> 
> No 0s or -1s.
> 
> Therefore I am delighted to announce that the proposal to release
> Apache Calcite 1.21.0 has passed.
> 
> Thanks everyone. We’ll now roll the release out to the mirrors.
> 
> There was some feedback during voting. I shall open a separate
> thread to discuss.
> 
> Stamatis
> 




Re: Issues in exposing data via TableFunction vs TableMacro

2019-09-10 Thread Julian Feinauer
Hey,

when going through the Code I just had another Idea.
Currently a TableFunction is executed as EnumerableTableFunctionScan which gets 
generated from a LogicalTableFunctionScan by the Rule 
EnumerableTableFunctionScanRule.
What if you just remove that Rule and add a custom Rule of yours which 
translates it of a TableScan of your taste?

Julian

Am 10.09.19, 08:13 schrieb "Julian Feinauer" :

Hi Gabriel,

thats an interesting question for me too.
Do you need parameters for those "dynamic tables"?
If not you could do it similar to what Drill is doing and just implement a 
Schema which always returns "true" if someone asks for a Table and then returns 
a Table Implementation that you provide where you can hook in later and add the 
functionality that you in fact need. This can then also be used in optimization 
as you can then control your custom Table type.
Perhaps it helps to look at the DrillTable class in [1].

On a Side node I try to figure out what would be necessary to make 
TableFunction wo also work with TranslatableTable.
Would you mind opening an issue in Jira for that?

Julian

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillTable.java
 

Am 10.09.19, 03:25 schrieb "Gabriel Reid" :

Hi,

I'm currently using a combination of TableFunctions and TableMacros to
expose various dynamic (relatively unstructured) data sources via 
Calcite.
The underlying data sources are such that data can only be retrieved by
first specifying what you want (i.e. there is no catalog of all data 
that
is available).

I'm currently handling this by using a combination of TableFunctions and
TableMacros.

The issue that I'm running into comes when I want to implement custom
planner rules for the underlying functionality. As far I as I can see, 
it's
not possible to register planner rules based on a TableFunctionImpl,
because a TableFunctionImpl only exposes a ScannableTable, so there's no
chance to hook into RelOptNode.register.

On the other hand, implementing a TableMacro does allow to return a
TranslatableTable, which then does allow intercepting the call to
RelOptNode.register to register rules. However, TableMacros require that
all parameters are literals, and I'm providing timestamps, via
TIMESTAMPADD() and CURRENT_TIMESTAMP() calls, which then doesn't work 
for
TableMacros (all parameters to a table macro need to be literals, 
otherwise
query validation fails in Calcite).

I'm wondering if I'm missing some built-in functionality which would 
make
it possible to have a dynamic table function/macro that can also be
manipulated via custom planner rules.

Options (which may or may not exist) that I can think of are:
* something that would/could visit all macro parameters ahead of time 
and
resolve things like CURRENT_TIMESTAMP() to a literal, before further 
query
validation occurs
* register rules somewhere outside of RelOptNode.register (e.g. when the
schema is first created)

Are there any currently-working options in Calcite that can help me do 
what
I'm trying to do? And if there aren't and I would add such a thing to
Calcite, are there any suggestions as to what the most appropriate 
approach
would be (either one of the two options I listed above, or something 
else)?

Thanks,

Gabriel






Re: Issues in exposing data via TableFunction vs TableMacro

2019-09-10 Thread Julian Feinauer
Hi Gabriel,

thats an interesting question for me too.
Do you need parameters for those "dynamic tables"?
If not you could do it similar to what Drill is doing and just implement a 
Schema which always returns "true" if someone asks for a Table and then returns 
a Table Implementation that you provide where you can hook in later and add the 
functionality that you in fact need. This can then also be used in optimization 
as you can then control your custom Table type.
Perhaps it helps to look at the DrillTable class in [1].

On a Side node I try to figure out what would be necessary to make 
TableFunction wo also work with TranslatableTable.
Would you mind opening an issue in Jira for that?

Julian

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillTable.java
 

Am 10.09.19, 03:25 schrieb "Gabriel Reid" :

Hi,

I'm currently using a combination of TableFunctions and TableMacros to
expose various dynamic (relatively unstructured) data sources via Calcite.
The underlying data sources are such that data can only be retrieved by
first specifying what you want (i.e. there is no catalog of all data that
is available).

I'm currently handling this by using a combination of TableFunctions and
TableMacros.

The issue that I'm running into comes when I want to implement custom
planner rules for the underlying functionality. As far I as I can see, it's
not possible to register planner rules based on a TableFunctionImpl,
because a TableFunctionImpl only exposes a ScannableTable, so there's no
chance to hook into RelOptNode.register.

On the other hand, implementing a TableMacro does allow to return a
TranslatableTable, which then does allow intercepting the call to
RelOptNode.register to register rules. However, TableMacros require that
all parameters are literals, and I'm providing timestamps, via
TIMESTAMPADD() and CURRENT_TIMESTAMP() calls, which then doesn't work for
TableMacros (all parameters to a table macro need to be literals, otherwise
query validation fails in Calcite).

I'm wondering if I'm missing some built-in functionality which would make
it possible to have a dynamic table function/macro that can also be
manipulated via custom planner rules.

Options (which may or may not exist) that I can think of are:
* something that would/could visit all macro parameters ahead of time and
resolve things like CURRENT_TIMESTAMP() to a literal, before further query
validation occurs
* register rules somewhere outside of RelOptNode.register (e.g. when the
schema is first created)

Are there any currently-working options in Calcite that can help me do what
I'm trying to do? And if there aren't and I would add such a thing to
Calcite, are there any suggestions as to what the most appropriate approach
would be (either one of the two options I listed above, or something else)?

Thanks,

Gabriel




Re: Meetup / Hackathon

2019-09-09 Thread Julian Feinauer
Hi,

I invited everybody who answered to this thread but you can register yourself 
using the link http://s.apache.org/apachecon-slack

Julian

Am 09.09.19, 17:07 schrieb "Julian Hyde" :

You should go to the speakers’ dinner; I hope you were invited. 

I’m just boarding in SFO so I’ll miss most of the speakers’ dinner. 

Did you get the invite to the ApacheCon slack? Let’s communicate there. 

Julian

> On Sep 9, 2019, at 16:40, Julian Feinauer  
wrote:
> 
> Hi all,
> I have my talk till 18:20. What about meeting later today somewhere?
> 
> Julian
> 
> 
> From: Danny Chan 
> Sent: Monday, September 9, 2019 1:16:34 AM
> To: dev@calcite.apache.org 
> Subject: Re: Meetup / Hackathon
> 
> Really good chance for our fellows to have a meet up, look forward to 
your spark of the thoughts.
> 
> Best,
> Danny Chan
> 在 2019年9月7日 +0800 PM1:58,Julian Feinauer 
,写道:
>> Hi all,
>> 
>> as I’m currently traveling to Las Vegas I asked myself if there already 
are any considerations to meet with the Calcite Community, e.g. in the 
Hackathon Space or something.
>> I saw some talsk about Caclite so I guess that some PMCs / Contributors 
are in Las Vegas and I would be really happy to gather with some of you!
>> 
>> Julian




Re: Meetup / Hackathon

2019-09-09 Thread Julian Feinauer
Hi all,
I have my talk till 18:20. What about meeting later today somewhere?

Julian


From: Danny Chan 
Sent: Monday, September 9, 2019 1:16:34 AM
To: dev@calcite.apache.org 
Subject: Re: Meetup / Hackathon

Really good chance for our fellows to have a meet up, look forward to your 
spark of the thoughts.

Best,
Danny Chan
在 2019年9月7日 +0800 PM1:58,Julian Feinauer ,写道:
> Hi all,
>
> as I’m currently traveling to Las Vegas I asked myself if there already are 
> any considerations to meet with the Calcite Community, e.g. in the Hackathon 
> Space or something.
> I saw some talsk about Caclite so I guess that some PMCs / Contributors are 
> in Las Vegas and I would be really happy to gather with some of you!
>
> Julian


Re: [VOTE] Release apache-calcite-1.21.0 (release candidate 1)

2019-09-09 Thread Julian Feinauer
Hi Stamatis,

thank you for your effort!

+1 (non-binding)

I found some minor issues I described below and the failing Test from the last 
RC but as its addressed for the next release, so I think it's reasonable 
addressed.

I checked:
- Checksum and Signature correct
- Checked LICENSE and NOTICE
- Checked diff between rc0 / rc1 as expected (no pom.xml.??? files)
- Run build as described in "howto.md" (mvnw install) on "java version 
"1.8.0_181" (Hotspot) on OS X fails (see below)
- Run build as "mvnw install" exceeds if OsAdapterTest is removed
- Checked no unexpected binaries
- Checked License headers with rat

Minor Issues:
- README is not up to date as it says that README.md contains "examples of 
running calcite" which it doesn’t
- Information on how to build the project is only available on the homepage not 
in one of the readmes or doc files (a link to "howto.md" in the README or 
README.md would be good, I think)

Test fails:
- As expected the OsAdapterTest has some issues (see CALCITE-2816)
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]   OsAdapterTest.testPs:156->lambda$testPs$3:158 » Runtime while parsing 
value [0...
[ERROR]   OsAdapterTest.testPsDistinct:177 » SQL Error while executing SQL 
"select disti...
[INFO]
[ERROR] Tests run: 60, Failures: 0, Errors: 2, Skipped: 24

- In one run the following fails, but succeeds at second run. I log a jira if 
it ever happens again (checked in IDE and worked fine there)
[ERROR] Tests run: 8, Failures: 0, Errors: 1, Skipped: 4, Time elapsed: 4.375 s 
<<< FAILURE! - in org.apache.calcite.test.PigRelBuilderStyleTest
[ERROR] 
testImplWithCountWithoutGroupBy(org.apache.calcite.test.PigRelBuilderStyleTest) 
 Time elapsed: 2.712 s  <<< ERROR!
org.apache.pig.impl.logicalLayer.FrontendException: Unable to open iterator for 
alias t
at 
org.apache.calcite.test.PigRelBuilderStyleTest.assertScriptAndResults(PigRelBuilderStyleTest.java:270)
at 
org.apache.calcite.test.PigRelBuilderStyleTest.testImplWithCountWithoutGroupBy(PigRelBuilderStyleTest.java:130)
Caused by: org.apache.pig.PigException: Unable to store alias t
at 
org.apache.calcite.test.PigRelBuilderStyleTest.assertScriptAndResults(PigRelBuilderStyleTest.java:270)
at 
org.apache.calcite.test.PigRelBuilderStyleTest.testImplWithCountWithoutGroupBy(PigRelBuilderStyleTest.java:130)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: Error processing 
rule LoadTypeCastInserter
at 
org.apache.calcite.test.PigRelBuilderStyleTest.assertScriptAndResults(PigRelBuilderStyleTest.java:270)
at 
org.apache.calcite.test.PigRelBuilderStyleTest.testImplWithCountWithoutGroupBy(PigRelBuilderStyleTest.java:130)
Caused by: java.lang.NullPointerException
at 
org.apache.calcite.test.PigRelBuilderStyleTest.assertScriptAndResults(PigRelBuilderStyleTest.java:270)
at 
org.apache.calcite.test.PigRelBuilderStyleTest.testImplWithCountWithoutGroupBy(PigRelBuilderStyleTest.java:130)

Julian

Am 09.09.19, 02:02 schrieb "Anton Haidai" :

Hello,

Local Calcite build with tests enabled on Linux: OK
Calcite-based system (Zoomdata) test suite: OK

+1 (non-binding)

On Fri, Sep 6, 2019 at 7:42 PM Stamatis Zampetakis 
wrote:

> Hi all,
>
> I have created a build for Apache Calcite 1.21.0, release candidate 1.
>
> Thanks to everyone who has contributed to this release.
>
> Since RC 0, we have fixed the following issues:
> * [CALCITE-3322] Remove duplicate test case in RelMetadataTest (沈洪)
> * [CALCITE-3321] BigQuery does not have correct casing rules (Lindsey
> Meyer)
> * Remove the useless JdbcConvention out in descriptionPrefix for
> JdbcToEnumerableConverterRule
> * Removed spurious *.xml.xxx files from the release artifacts
>
> You can read the release notes here:
> 
https://github.com/apache/calcite/blob/calcite-1.21.0/site/_docs/history.md
>
> The commit to be voted upon:
>
> 
https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=adc1532de853060d24fd0129257a3fae306fb55c
>
> Its hash is adc1532de853060d24fd0129257a3fae306fb55c.
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.21.0-rc1/
>
> The hashes of the artifacts are as follows:
> src.tar.gz.sha256
> f9b37fc08f20e8fa7ec8035172852468359fb855e007943fc087ba310f4e
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachecalcite-1067
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/zabetak.asc
>
> Please vote on releasing this package as Apache Calcite 1.21.0.
>
> The vote is open for the next 96 hours (due to the weekend) and passes if 
a
> majority of
> at least three +1 PMC votes are cast.
>
> [ ] +1 Release this 

Re: Meetup / Hackathon

2019-09-08 Thread Julian Feinauer
Hi,

I'm here the full week.. but as I'm pretty busy during the day I would prefer 
something like the evening or in the morning, but if it doesn’t fit different, 
I can also try to organize something during the day.

Julian

Am 08.09.19, 22:21 schrieb "Haisheng Yuan" :

I will be Vegas Mon through Thu. Looking forward to meeting with you.

- Haisheng

--
发件人:Julian Hyde
日 期:2019年09月07日 15:38:59
收件人:dev@calcite.apache.org
主 题:Re: Meetup / Hackathon

I'm in Vegas Mon evening - Thu lunchtime. Would love to hang out and write 
code.

IMHO, conferences are for "conferring" - chatting in corridors, and
occasionally in bars - more than they are about attending talks.

Who else is going?

Julian

On Fri, Sep 6, 2019 at 10:58 PM Julian Feinauer
 wrote:
>
> Hi all,
>
> as I’m currently traveling to Las Vegas I asked myself if there already 
are any considerations to meet with the Calcite Community, e.g. in the 
Hackathon Space or something.
> I saw some talsk about Caclite so I guess that some PMCs / Contributors 
are in Las Vegas and I would be really happy to gather with some of you!
>
> Julian




Meetup / Hackathon

2019-09-06 Thread Julian Feinauer
Hi all,

as I’m currently traveling to Las Vegas I asked myself if there already are any 
considerations to meet with the Calcite Community, e.g. in the Hackathon Space 
or something.
I saw some talsk about Caclite so I guess that some PMCs / Contributors are in 
Las Vegas and I would be really happy to gather with some of you!

Julian


Re: [DISCUSS] Release apache-calcite-1.21.0 (release candidate 0)

2019-09-05 Thread Julian Feinauer
Hey,

I'm not sure if we need to cancel it ASAP, I would wait for the results of 
Julian (Hyde) and Danny and their binding votes.
As every PMC handles that a bit differently I'm unsure whether you consider 
that "major enough" or not.

Julian

Am 05.09.19, 09:19 schrieb "Stamatis Zampetakis" :

Sorry about that, I will cancel the vote and start a new one for rc1 ASAP.

In the meantime, do we want to fix CALCITE-2816 (or skip the relevant tests
for problematic locales)?

On Thu, Sep 5, 2019 at 8:28 AM Julian Feinauer 

wrote:

> Hi,
>
> I took the freedom to fork a DISCUSS thread to keep the VOTE thread a bit
> cleaner.
>
> AFAIR these "pom.xml.next" come from the maven release plugin and show how
> the pom would look like in the next iteration (from "prepare" Phase)
> Probably they were not cleaned up properly during "perform" phase.
>
> Julian
>
> Am 05.09.19, 08:04 schrieb "Danny Chan" :
>
> I made the diff cmd and also see these outputs, it seems that
> pom.xml.next and pom.xml.tag comes from a plugin ?
>
> Best,
> Danny Chan
> 在 2019年9月5日 +0800 AM2:53,Julian Hyde ,写道:
> > I’m still reviewing the release, but I have an observation and a
> question. There are a bunch of pom.xml.next and pom.xml.tag files that 
I’ve
> not seen before. What is the purpose of these?
> >
> > It’s OK that we include DEPENDENCIES (which is generated). I’m a bit
> surprised that we do not include .gitignore in the release.
> >
> > Julian
> >
> >
> > $ cd /tmp
> > $ tar xvfz
> 
~/apache/dist/dev/calcite/apache-calcite-1.21.0-rc0/apache-calcite-1.21.0-src.tar.gz
> > $ cd ~/dev/calcite
> > $ git checkout calcite-1.21.0
> > $ diff -r . /tmp/apache-calcite-1.21.0-src/
> > Only in /tmp/apache-calcite-1.21.0-src/babel: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/babel: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/cassandra: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/cassandra: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/core: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/core: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/: DEPENDENCIES
> > Only in /tmp/apache-calcite-1.21.0-src/druid: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/druid: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/elasticsearch: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/elasticsearch: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/example/csv: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/example/csv: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/example/function: 
pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/example/function: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/example: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/example: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/file: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/file: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/geode: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/geode: pom.xml.tag
> > Only in .: .git
> > Only in .: .gitattributes
> > Only in .: .gitignore
> > Only in /tmp/apache-calcite-1.21.0-src/kafka: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/kafka: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/linq4j: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/linq4j: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/mongodb: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/mongodb: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/pig: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/pig: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/piglet: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/piglet: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/plus: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/plus: pom.xml.tag
> > Only in /tmp/apache-calcite-1.21.0-src/: pom.xml.next
> > Only in /tmp/apache-calcite-1.21.0-src/: pom.xml.tag
> > Only in /tmp/apache-calcit

[DISCUSS] Release apache-calcite-1.21.0 (release candidate 0)

2019-09-05 Thread Julian Feinauer
t;
> >
> > +1 (non-binding)
> >
> > On Wed, Sep 4, 2019 at 2:55 AM Julian Hyde  wrote:
> >
> > > Regarding the 'ps' failures. I've added a suggestion to
    > > >
> > > 
https://issues.apache.org/jira/browse/CALCITE-2816?focusedCommentId=16921772=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16921772
> > > .
> > > Let's continue conversation there. Release threads are not a place for
> > > technical discussion.
> > >
> > > On Tue, Sep 3, 2019 at 3:42 AM Julian Feinauer
> > >  wrote:
> > > >
> > > > Thanks Vladimir,
> > > >
> > > > This would also be my first assumption that its due to German locale
> > > which also causes problems with parsing frequently.
> > > >
> > > > I think we should either fix that or note it somewhere in the how 
to to
> > > ensure that people can reproduce the build, so I think I should keep 
my -1
> > > (but it's not binding so not that big of an issue).
> > > >
> > > > Julian
> > > >
> > > > Von meinem Mobiltelefon gesendet
> > > >
> > > >
> > > >  Ursprüngliche Nachricht 
> > > > Betreff: Re: [VOTE] Release apache-calcite-1.21.0 (release 
candidate 0)
> > > > Von: Vladimir Sitnikov
> > > > An: Apache Calcite dev list
> > > > Cc:
> > > >
> > > > It is a "well known"
> > > > https://issues.apache.org/jira/browse/CALCITE-2816 PsTableFunction
> > > > fails in Russian locale
> > > >
> > > > In other words, "float number parsing is locale-sensitive".
> > > > Vladimir
> > >
> >
> >
> > --
> > Best regards,
> > Anton.
>




AW: [VOTE] Release apache-calcite-1.21.0 (release candidate 0)

2019-09-03 Thread Julian Feinauer
Thanks Vladimir,

This would also be my first assumption that its due to German locale which also 
causes problems with parsing frequently.

I think we should either fix that or note it somewhere in the how to to ensure 
that people can reproduce the build, so I think I should keep my -1 (but it's 
not binding so not that big of an issue).

Julian

Von meinem Mobiltelefon gesendet


 Ursprüngliche Nachricht 
Betreff: Re: [VOTE] Release apache-calcite-1.21.0 (release candidate 0)
Von: Vladimir Sitnikov
An: Apache Calcite dev list
Cc:

It is a "well known"
https://issues.apache.org/jira/browse/CALCITE-2816 PsTableFunction
fails in Russian locale

In other words, "float number parsing is locale-sensitive".
Vladimir


AW: [VOTE] Release apache-calcite-1.21.0 (release candidate 0)

2019-09-03 Thread Julian Feinauer
Hi Stamatis,

Yes, if it works for you then I'll try to track that down and file a Jira.

Julian

Von meinem Mobiltelefon gesendet


 Ursprüngliche Nachricht 
Betreff: Re: [VOTE] Release apache-calcite-1.21.0 (release candidate 0)
Von: Stamatis Zampetakis
An: dev@calcite.apache.org
Cc:

Thanks for raising this Julian.

I tested today on MacOS Mojave (10.14.5) and jdk 1.8.0_111 and build (mvnw
install) finishes without errors.

I guess the problem you raised is related to your local environment.

Can you please log a JIRA with more details on what happens?
Can you also check if this is a regression from 1.20.0?

Best,
Stamatis

On Mon, Sep 2, 2019, 9:32 PM Julian Feinauer 
wrote:

> Hi Stamatis,
>
> thank you for your effort!
>
> Unfortunately, I think I have to vote
>
> -1 (non-binding)
>
> as I encountered an issue building the artefacts with the given
> instruction (see below).
> But as my vote is non-binding it's up to the more experienced devs to
> decide how to handle this case exactly.
> I am not sure on how to interpret the policy [1] in that case.
>
> I also found some minor issues I described below.
>
> I checked:
> - Checksum and Signature correct
> - Checked LICENSE and NOTICE
> - Run build as described in "howto.md" (mvnw install) on "java version
> "1.8.0_181" (Hotspot) on OS X fails (see below)
> - Run build as "mvnw install -DskipTests" succeeds
> - Checked no unexpected binaries
>
> Issues:
> - I am unable to build the package (see above) as the following tests fail:
>
> [INFO] Results:
> [INFO]
> [ERROR] Errors:
> [ERROR]   OsAdapterTest.testPs:156->lambda$testPs$3:158 » Runtime while
> parsing value [0...
> [ERROR]   OsAdapterTest.testPsDistinct:177 » SQL Error while executing SQL
> "select disti...
> [INFO]
> [ERROR] Tests run: 60, Failures: 0, Errors: 2, Skipped: 24
>
> Minor Issues:
> - README is not up to date as it says that README.md contains "examples of
> running calcite" which it doesn’t
> - Information on how to build the project is only available on the
> homepage not in one of the readmes or doc files (a link to "howto.md" in
> the README or README.md would be good, I think)
>
> Best
> Julian
>
> [1] https://apache.org/legal/release-policy.html
>
> Am 02.09.19, 17:16 schrieb "Stamatis Zampetakis" :
>
> Hi all,
>
> I have created a build for Apache Calcite 1.21.0, release candidate 0.
>
> Thanks to everyone who has contributed to this release.
>
> This release comes two months after 1.20.0 and brings many bug fixes
> and
> new
> important features among others the following.
>
> * Support for implicit casts.
> * Transformation of Pig Latin scripts to calcite plans.
> * Implementation for MATCH_RECOGNIZE and its basic features.
> * Support for correlated ANY/SOME/ALL subqueries.
> * New join algorithm implementations.
>
> You can read the release notes here:
>
> https://github.com/apache/calcite/blob/calcite-1.21.0/site/_docs/history.md
>
> The commit to be voted upon:
>
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4420139600bb68f8568c4b1c960ad4dbcb1a43f1
>
> Its hash is 4420139600bb68f8568c4b1c960ad4dbcb1a43f1.
>
> The artifacts to be voted on are located here:
>
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.21.0-rc0/
>
> The hashes of the artifacts are as follows:
> src.tar.gz.sha256
> f9b37fc08f20e8fa7ec8035172852468359fb855e007943fc087ba310f4e
>
> A staged Maven repository is available for review at:
>
> https://repository.apache.org/content/repositories/orgapachecalcite-1066
>
> Release artifacts are signed with the following key:
> http://home.apache.org/~zabetak/zabetak.asc
> https://people.apache.org/keys/committer/zabetak.asc (in the next LDAP
> refresh)
>
> Please vote on releasing this package as Apache Calcite 1.21.0.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Calcite 1.21.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> Stamatis
>
>
>


Re: [VOTE] Release apache-calcite-1.21.0 (release candidate 0)

2019-09-02 Thread Julian Feinauer
Hi Stamatis,

thank you for your effort!

Unfortunately, I think I have to vote

-1 (non-binding)

as I encountered an issue building the artefacts with the given instruction 
(see below).
But as my vote is non-binding it's up to the more experienced devs to decide 
how to handle this case exactly.
I am not sure on how to interpret the policy [1] in that case.

I also found some minor issues I described below.

I checked:
- Checksum and Signature correct
- Checked LICENSE and NOTICE
- Run build as described in "howto.md" (mvnw install) on "java version 
"1.8.0_181" (Hotspot) on OS X fails (see below)
- Run build as "mvnw install -DskipTests" succeeds
- Checked no unexpected binaries

Issues:
- I am unable to build the package (see above) as the following tests fail:

[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]   OsAdapterTest.testPs:156->lambda$testPs$3:158 » Runtime while parsing 
value [0...
[ERROR]   OsAdapterTest.testPsDistinct:177 » SQL Error while executing SQL 
"select disti...
[INFO]
[ERROR] Tests run: 60, Failures: 0, Errors: 2, Skipped: 24

Minor Issues:
- README is not up to date as it says that README.md contains "examples of 
running calcite" which it doesn’t
- Information on how to build the project is only available on the homepage not 
in one of the readmes or doc files (a link to "howto.md" in the README or 
README.md would be good, I think)

Best
Julian

[1] https://apache.org/legal/release-policy.html

Am 02.09.19, 17:16 schrieb "Stamatis Zampetakis" :

Hi all,

I have created a build for Apache Calcite 1.21.0, release candidate 0.

Thanks to everyone who has contributed to this release.

This release comes two months after 1.20.0 and brings many bug fixes and
new
important features among others the following.

* Support for implicit casts.
* Transformation of Pig Latin scripts to calcite plans.
* Implementation for MATCH_RECOGNIZE and its basic features.
* Support for correlated ANY/SOME/ALL subqueries.
* New join algorithm implementations.

You can read the release notes here:
https://github.com/apache/calcite/blob/calcite-1.21.0/site/_docs/history.md

The commit to be voted upon:

https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4420139600bb68f8568c4b1c960ad4dbcb1a43f1

Its hash is 4420139600bb68f8568c4b1c960ad4dbcb1a43f1.

The artifacts to be voted on are located here:
https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.21.0-rc0/

The hashes of the artifacts are as follows:
src.tar.gz.sha256
f9b37fc08f20e8fa7ec8035172852468359fb855e007943fc087ba310f4e

A staged Maven repository is available for review at:
https://repository.apache.org/content/repositories/orgapachecalcite-1066

Release artifacts are signed with the following key:
http://home.apache.org/~zabetak/zabetak.asc
https://people.apache.org/keys/committer/zabetak.asc (in the next LDAP
refresh)

Please vote on releasing this package as Apache Calcite 1.21.0.

The vote is open for the next 72 hours and passes if a majority of
at least three +1 PMC votes are cast.

[ ] +1 Release this package as Apache Calcite 1.21.0
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...


Here is my vote:

+1 (binding)

Stamatis




[REVIEW] Request for Review CALCITE-3302 (CLASSIFIER for MR)

2019-08-28 Thread Julian Feinauer
Hi all,

as I try to make progress towards the *full* functionality of MATCH_RECOGNIZE 
in Calcite I added support for the CLASSIFIER function.
The issue can be found in [1] and the PR in [2].

I would kindly ask for a review and some comments as the current implementation 
works BUT is not elegant and not as Calcite-esque as I would like.
I think I should add some kind of Context for MR or so but would really like to 
get some input.
After finishing that I would like to add more and more MR functions, like e.g 
[3].

Thanks!
Julian

[1] https://issues.apache.org/jira/browse/CALCITE-3302
[2] https://github.com/apache/calcite/pull/1419
[3] https://issues.apache.org/jira/browse/CALCITE-3294



[jira] [Created] (CALCITE-3302) Add support for CLASSIFIER() command in MATCH_RECOGNIZE

2019-08-28 Thread Julian Feinauer (Jira)
Julian Feinauer created CALCITE-3302:


 Summary: Add support for CLASSIFIER() command in MATCH_RECOGNIZE
 Key: CALCITE-3302
 URL: https://issues.apache.org/jira/browse/CALCITE-3302
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Feinauer
Assignee: Julian Feinauer


The CLASSIFIER() command simply returns the defined pattern classifier that was 
matched for the respective column.

A very simple test case which could be added to match.iq is
{code:java}
select *
from "hr"."emps" match_recognize (
order by "empid" desc
measures "commission" as c,
"empid" as empid,
CLASSIFIER() as cl
pattern (s up)
define up as up."commission" < prev(up."commission"));

CEMPID  CL
 - -
1000   100   S
 500   200  UP

!ok
{code}




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (CALCITE-3294) Implement FINAL Clause for MATCH_RECOGNIZE

2019-08-26 Thread Julian Feinauer (Jira)
Julian Feinauer created CALCITE-3294:


 Summary: Implement FINAL Clause for MATCH_RECOGNIZE
 Key: CALCITE-3294
 URL: https://issues.apache.org/jira/browse/CALCITE-3294
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Feinauer


With CALCITE-1935 the initial support for the MATCH_RECOGNIZE clause was 
introduced. But it is still lacking several features.
One of them is the FINAL Clause which forces the `MEASURE` to act global on all 
Tuples which were matched by the pattern.

See 
https://oracle-base.com/articles/12c/pattern-matching-in-oracle-database-12cr1

An example query would be:

{code}
SELECT *
FROM sales_history MATCH_RECOGNIZE (
 ORDER BY tstamp
 MEASURES  
   LAST(A.tstamp) AS ts_prev,
   FINAL LAST(A.tstamp) AS ts_last
 ALL ROWS PER MATCH
 PATTERN (A+)
 DEFINE
   A AS A.units_sold > 10
   ) MR
ORDER BY MR.product, MR.start_tstamp;
{code}

Here, the query matches for each sequence of rows which all have `units_sold > 
10`.
For the column `ts_prev` it always shows the timestamp of the timestamp of the 
previous match (the row before).
But for `ts_last` it shows the SAME value for each column as the `FINAL` 
modifier changes teh behavior to apply the `LAST` operator to the record set 
(similar to a window aggregation on the machted subset of rows).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: PLC4X Request Optimization

2019-08-26 Thread Julian Feinauer
Hi Julian,

thank you very much for your insights.
Your analysis is very detailed and I agree with all your suggestions.
For myself I started to realize the "big" differences between Calcites Use Case 
and the PLC4X Case.
I will definitely have a look into the book advised by you (I like optimization 
quite a lot).

So, to sum up this thread, thanks all of you for your help and your suggestions.
I will bring a summary of this discussion and all next steps now over to the 
PLC4X list if someone is still interested to follow this topic : )

JulianF

Am 23.08.19, 18:12 schrieb "Julian Hyde" :

The first step is to realize that you are looking at an optimization 
problem. Congratulations, you’ve done that. 

The next step is to evaluate the possible optimization techniques. Volcano 
is suitable in cases where there is dataflow - where the solution is a tree or 
DAG. It doesn’t seem that you have that problem. So Volcano is not a good 
candidate, I agree. 

It is important that you separate cost model and allowable transformation 
rules from the optimization algorithms. Then you can try different algorithms.

If I were you, I would also consult a good book on the design of optimizing 
compilers - e.g the New Dragon book. They have techniques for dataflow 
optimization, dead code elimination, register assignment, loop unwinding. If 
those problems match your problems then maybe a compilation framework like LLVM 
or Graal would give you what you need. 

> On Aug 23, 2019, at 7:11 AM, Julian Feinauer 
 wrote:
> 
> Hi, 
> 
> a short update on this matter.
> We had a meetup yesterday with the PLC4X community and I prepared a very 
first "demo" implementation of the Optimizer / Framework.
> I tried to organize it after the Calcite / Volcano approach.
> 
> We had several discussions and experimented a bit and some things that 
came to my mind why I'm unsure whether the volcano framework really fits best 
here, or some other approach.
> 
> * Usually we have a pretty big set of "Operators" (in our case field 
requests in comparison to Calcites RelNodes):
> In regular cases they could be 10 but also quite often up to 100 (which 
is rather rare for 'sane' queries, I assume)
> * We have very few rules:
> In fact, we may have two or three rules (protocol specific), but usually 
of the form 'merge two field requests into one' or 'split one field request 
into two'
> * We have no tree structure, but everything is 'flat'
> 
> With the above setup its pretty obvious that we cannot profit from 
Volcanos dynamic programming approach. Furthermore, with the simple approach of 
applying all suitable rules to all possible candidates the state space explodes 
with O(n!) (where n could be large, see above).
> 
> So I think our best bet atm would be to exploit all possible spaces (but 
with excessive pruning) or use some other sort of algorithms like simple 
gradient descent (I think our convergence behavior should be quite nice if the 
cost function is "smooth" enough) or stochastic optimization like cross entropy 
or simulated annealing.
> 
> I just wanted to give some feedback back to the list as many people 
joined the discussion and had interesting ideas. And I still think that there 
is an overlap in whats done but the 'common core' is smaller than I initially 
assumed (e.g., some query optimizers AFAIR use approaches like simulated 
annealing).
> 
> Best
> Julian
> 
> Am 20.08.19, 15:38 schrieb "Julian Feinauer" 
:
> 
>Hi Stamatis,
> 
>thanks for your response.
>I think my brain just needs a bit more time to get really deep into 
those advanced planning topics (its sometimes slow on adopting...).
>But I will look through it.
>We have a meetup this week and will discuss the matter and how to 
setup everything to enable some optimization at first (introducing cost 
estimates and such) and then I will again have a deeper look and perhaps 
prepare a test case or a runnable test.
>Then its probably the easiest to reason about.
> 
>Julian
> 
>Am 20.08.19, 15:19 schrieb "Stamatis Zampetakis" :
> 
>Hi Julian F,
> 
>I admit that I didn't really get your example but talking about 
'batch
>request optimization'
>and 'collapsing "overlapping" but not equal requests' I get the 
impression
>that the problem
>is optimizing sets of queries which may have common 
sub-expressions;
>the problem is usually referred to as multi-query optimization and 
is
>indeed relevant with
>the Spool operator

Re: PLC4X Request Optimization

2019-08-23 Thread Julian Feinauer
Hi, 

a short update on this matter.
We had a meetup yesterday with the PLC4X community and I prepared a very first 
"demo" implementation of the Optimizer / Framework.
I tried to organize it after the Calcite / Volcano approach.

We had several discussions and experimented a bit and some things that came to 
my mind why I'm unsure whether the volcano framework really fits best here, or 
some other approach.

* Usually we have a pretty big set of "Operators" (in our case field requests 
in comparison to Calcites RelNodes):
In regular cases they could be 10 but also quite often up to 100 (which is 
rather rare for 'sane' queries, I assume)
* We have very few rules:
In fact, we may have two or three rules (protocol specific), but usually of the 
form 'merge two field requests into one' or 'split one field request into two'
* We have no tree structure, but everything is 'flat'

With the above setup its pretty obvious that we cannot profit from Volcanos 
dynamic programming approach. Furthermore, with the simple approach of applying 
all suitable rules to all possible candidates the state space explodes with 
O(n!) (where n could be large, see above).

So I think our best bet atm would be to exploit all possible spaces (but with 
excessive pruning) or use some other sort of algorithms like simple gradient 
descent (I think our convergence behavior should be quite nice if the cost 
function is "smooth" enough) or stochastic optimization like cross entropy or 
simulated annealing.

I just wanted to give some feedback back to the list as many people joined the 
discussion and had interesting ideas. And I still think that there is an 
overlap in whats done but the 'common core' is smaller than I initially assumed 
(e.g., some query optimizers AFAIR use approaches like simulated annealing).

Best
Julian

Am 20.08.19, 15:38 schrieb "Julian Feinauer" :

Hi Stamatis,

thanks for your response.
I think my brain just needs a bit more time to get really deep into those 
advanced planning topics (its sometimes slow on adopting...).
But I will look through it.
We have a meetup this week and will discuss the matter and how to setup 
everything to enable some optimization at first (introducing cost estimates and 
such) and then I will again have a deeper look and perhaps prepare a test case 
or a runnable test.
Then its probably the easiest to reason about.

Julian

Am 20.08.19, 15:19 schrieb "Stamatis Zampetakis" :

Hi Julian F,

I admit that I didn't really get your example but talking about 'batch
request optimization'
and 'collapsing "overlapping" but not equal requests' I get the 
impression
that the problem
is optimizing sets of queries which may have common sub-expressions;
the problem is usually referred to as multi-query optimization and is
indeed relevant with
the Spool operator mentioned by Julian H.

If that's the case then the most relevant work that I can think of is 
[1],
which solves the problem
by slightly modifying the search strategy of the Volcano planner.

Best,
Stamatis

[1] Roy, Prasan, et al. "Efficient and extensible algorithms for multi
query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. (
https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/mqo-sigmod00.pdf)


On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

> Hi Julian,
>
> thanks for the reply.
> I have to think about that, I think.
>
> But as I understand the Spool Operator this is to factor out multiple
> calculations of the same issue.
> In our Situation we aim more on collapsing "overlapping" but not equal
> requests.
>
> Consider 8 bits which form physically a byte.
> If I read 8 BOOLEANs I have 8 different request which mask one bit, 
return
> it (padded) as byte. So 8 requests and 8 bytes data transfer (plus 
masking
> on the PLC).
> If I would optimize it to read the byte in one request and do the 
masking
> afterwards I would have one request and only 1 byte transferred (plus 
no
> masking on the PLC which keeps pressure low there).
>
> This could be modelled by introducing respective "RelNodes" and 
Planner
> Rules, I think but I do not fully understand how Spool fits in here?
>
> Julian
>
> Am 19.08.19, 20:42 schrieb "Julian Hyde" :
>
> One tricky aspect is to optimize a *batch* of requests.
>
> The trick is to tie tog

Re: [DISCUSS] ANTLR4 parse template for Calcite ?

2019-08-22 Thread Julian Feinauer
Hi,

there are some SQL dialect grammars online here (for ANTLR4)

https://github.com/antlr/grammars-v4/tree/master/mysql
https://github.com/antlr/grammars-v4/tree/master/plsql
https://github.com/antlr/grammars-v4/tree/master/sqlite
https://github.com/antlr/grammars-v4/tree/master/tsql

They could be a starting point for your work?

I used these as I wrote an ANTLR grammar for an SQL Like syntax the last time.

JulianF

Am 22.08.19, 19:39 schrieb "Julian Hyde" :

If you are going to do all that work to translate to ANTLR, one thing that 
may help is to re-use SqlParserTest.java. (Shouldn’t be hard to translate that 
into javascript, or your could use a harness that calls the javascript code 
from java.) Your code may be entirely different, but the tests will ensure that 
it gives the same result.

> On Aug 22, 2019, at 4:04 AM, Michael Franzkowiak  
wrote:
> 
> It is not using ANTLR. Since our goal is specifically to support parsing
> and manipulation of SQL in the frontend, we use
> https://sap.github.io/chevrotain/docs/ . We're quite happy with that. We
> have some pretty big ANTLR grammars for other (non-SQL) use cases and this
> approach definitely feels more lightweight.
> 
> On Thu, Aug 22, 2019 at 12:22 PM Danny Chan  wrote:
> 
>> Create ! Do you have the ANTLR.g4 file that can be shared ?
>> 
>> Best,
>> Danny Chan
>> 在 2019年8月22日 +0800 PM5:45,Michael Franzkowiak ,写道:
>>> Danny, what is your web / frontend use case exactly?
>>> We've started to create some frontend helpers which you can find at
>>> https://github.com/contiamo/rhombic . It's all in a very early state but
>>> we'll likely spend some more time on it in the next months. Parsing is
>> here
>>> https://github.com/contiamo/rhombic/blob/master/src/SqlParser.ts .
>>> 
>>> On Thu, Aug 22, 2019 at 11:38 AM Muhammad Gelbana 
>>> wrote:
>>> 
 I once needed to fix this issue [1] but the fix was rejected because it
 introduced worse performance than it ideally should. As mentioned in
>> the
 comments, the current approach followed in the current parser is the
>> reason
 for that. I mean if we designed the grammar differently, we could've
>> had
 fixed the linked issue a long time ago as Julian already attempted to
>> fix
 it.
 
 Having that said, we might go with *antlr* only to have that "better"
 approach for our parsers. We don't have to dump our current parser of
 course as *antlr* can be optionally activated.
 
 [1] https://issues.apache.org/jira/browse/CALCITE-35
 
 Thanks,
 Gelbana
 
 
 On Thu, Aug 22, 2019 at 10:05 AM Danny Chan 
>> wrote:
 
> Thanks, Julian.
> 
> I agree this would be a huge work, but I have to do this, I’m just
> wondering if any fellows here have the similar requests.
> 
> Best,
> Danny Chan
> 在 2019年8月22日 +0800 PM2:15,Julian Hyde ,写道:
>> ANTLR isn’t significantly better than, or worse than, JavaCC, but
>> it’s
> different. So translating to ANTLR would be a rewrite, and would be a
 HUGE
> amount of work.
>> 
>> 
>> 
>>> On Aug 21, 2019, at 8:01 PM, Danny Chan 
 wrote:
>>> 
>>> Now some of our fellows want to do the syntax promote in the WEB
 page,
> and they what a parser in the front-page; The ANTLR4 can generate JS
 parser
> directly but JAVACC couldn’t.
>>> 
>>> So I’m wondering do you have the similar requests ? And do you
>> think
> there is necessity to support ANTLR4 g4 file in Calcite ?
>>> 
>>> 
>>> Best,
>>> Danny Chan
>> 
> 
 
>>> 
>>> 
>> 





Re: PLC4X Request Optimization

2019-08-20 Thread Julian Feinauer
Hi Stamatis,

thanks for your response.
I think my brain just needs a bit more time to get really deep into those 
advanced planning topics (its sometimes slow on adopting...).
But I will look through it.
We have a meetup this week and will discuss the matter and how to setup 
everything to enable some optimization at first (introducing cost estimates and 
such) and then I will again have a deeper look and perhaps prepare a test case 
or a runnable test.
Then its probably the easiest to reason about.

Julian

Am 20.08.19, 15:19 schrieb "Stamatis Zampetakis" :

Hi Julian F,

I admit that I didn't really get your example but talking about 'batch
request optimization'
and 'collapsing "overlapping" but not equal requests' I get the impression
that the problem
is optimizing sets of queries which may have common sub-expressions;
the problem is usually referred to as multi-query optimization and is
indeed relevant with
the Spool operator mentioned by Julian H.

If that's the case then the most relevant work that I can think of is [1],
which solves the problem
by slightly modifying the search strategy of the Volcano planner.

Best,
Stamatis

[1] Roy, Prasan, et al. "Efficient and extensible algorithms for multi
query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. (
https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/mqo-sigmod00.pdf)


On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

> Hi Julian,
>
> thanks for the reply.
> I have to think about that, I think.
>
> But as I understand the Spool Operator this is to factor out multiple
> calculations of the same issue.
> In our Situation we aim more on collapsing "overlapping" but not equal
> requests.
>
> Consider 8 bits which form physically a byte.
> If I read 8 BOOLEANs I have 8 different request which mask one bit, return
> it (padded) as byte. So 8 requests and 8 bytes data transfer (plus masking
> on the PLC).
> If I would optimize it to read the byte in one request and do the masking
> afterwards I would have one request and only 1 byte transferred (plus no
> masking on the PLC which keeps pressure low there).
>
> This could be modelled by introducing respective "RelNodes" and Planner
> Rules, I think but I do not fully understand how Spool fits in here?
>
> Julian
>
> Am 19.08.19, 20:42 schrieb "Julian Hyde" :
>
> One tricky aspect is to optimize a *batch* of requests.
>
> The trick is to tie together the batch so that it is costed as one
> request. We don’t have an operator specifically for that, but you could 
for
> instance use UNION ALL. E.g. given Q1 and Q2, you could generate a plan 
for
>
>   select count(*) from Q1 union all select count(*) from Q2
>
> If the plan for the batch is be a DAG (i.e. sharing work between the
> components of the batch by creating something akin to “temporary tables”)
> then you are in the territory for which we created the Spool operator (see
> discussion in https://issues.apache.org/jira/browse/CALCITE-481 <
> https://issues.apache.org/jira/browse/CALCITE-481>).
>
> Julian
>
>
> > On Aug 19, 2019, at 6:34 AM, Julian Feinauer <
> j.feina...@pragmaticminds.de> wrote:
> >
> > Hi Danny,
> >
> > thanks for the quick reply.
> > Cost calculation we can of course provide (but it could be a bit
> different as we have not only CPU and Memory but also Network or 
something).
> >
> > And also something like the RelNodes could be provided. In our case
> this would be "Requests" which are at first "Logical" and are then
> transformed to "Physical" Requests. For example the API allows you to
> request many fields per single request but some PLCs only allow one field
> per request. So this would be one task of this layer.
> >
> > Julian
> >
> > Am 19.08.19, 14:44 schrieb "Danny Chan" :
> >
> >Cool idea ! Julian Feinauer ~
> >
> >I think the volcano model can be used the base of the cost
> algorithm. As long as you define all the metadata that you care about.
> Another thing is that you should have a struct like RelNode and a method
> like #computeSelfCost.
> >
> >Best,
> >Danny Chan
> >在 2019年8月19日 +080

Re: PLC4X Request Optimization

2019-08-20 Thread Julian Feinauer
Hi Julian,

thanks for the reply.
I have to think about that, I think.

But as I understand the Spool Operator this is to factor out multiple 
calculations of the same issue.
In our Situation we aim more on collapsing "overlapping" but not equal requests.

Consider 8 bits which form physically a byte.
If I read 8 BOOLEANs I have 8 different request which mask one bit, return it 
(padded) as byte. So 8 requests and 8 bytes data transfer (plus masking on the 
PLC).
If I would optimize it to read the byte in one request and do the masking 
afterwards I would have one request and only 1 byte transferred (plus no 
masking on the PLC which keeps pressure low there).

This could be modelled by introducing respective "RelNodes" and Planner Rules, 
I think but I do not fully understand how Spool fits in here?

Julian

Am 19.08.19, 20:42 schrieb "Julian Hyde" :

One tricky aspect is to optimize a *batch* of requests.

The trick is to tie together the batch so that it is costed as one request. 
We don’t have an operator specifically for that, but you could for instance use 
UNION ALL. E.g. given Q1 and Q2, you could generate a plan for

  select count(*) from Q1 union all select count(*) from Q2

If the plan for the batch is be a DAG (i.e. sharing work between the 
components of the batch by creating something akin to “temporary tables”) then 
you are in the territory for which we created the Spool operator (see 
discussion in https://issues.apache.org/jira/browse/CALCITE-481 
<https://issues.apache.org/jira/browse/CALCITE-481>).

Julian


> On Aug 19, 2019, at 6:34 AM, Julian Feinauer 
 wrote:
> 
> Hi Danny,
> 
> thanks for the quick reply.
> Cost calculation we can of course provide (but it could be a bit 
different as we have not only CPU and Memory but also Network or something).
> 
> And also something like the RelNodes could be provided. In our case this 
would be "Requests" which are at first "Logical" and are then transformed to 
"Physical" Requests. For example the API allows you to request many fields per 
single request but some PLCs only allow one field per request. So this would be 
one task of this layer.
> 
    > Julian
    > 
> Am 19.08.19, 14:44 schrieb "Danny Chan" :
> 
>Cool idea ! Julian Feinauer ~
> 
>I think the volcano model can be used the base of the cost algorithm. 
As long as you define all the metadata that you care about. Another thing is 
that you should have a struct like RelNode and a method like #computeSelfCost.
> 
>Best,
>Danny Chan
>在 2019年8月19日 +0800 PM5:20,Julian Feinauer 
,写道:
>> Hi folks,
>> 
>> I’m here again with another PLC4X related question 
(https://plc4x.apache.org).
>> As we have more and more usecases we encounter situations where we send 
LOTS of replies to PLCs which one could sometimes optimize.
>> This has multiple reasons upstream (like multiple different Services 
sending, or you want two logically different addresses which could be 
physically equal).
>> 
>> So, we consider to add some kind of optimizer which takes a Batch of 
requests and tries to arrange them in an “optimal” way with regard to som cost 
function.
>> The cost functions would of course be given by each Driver but the 
optimizer could / should be rather general (possibly with pluggable rules).
>> 
>> As Calcites Planner already includes all of that I ask myself if it 
could be possible (and make sense) to use that in PLC4X.
>> Generally speaking, this raises the question if the Volcano approach can 
be suitable for such problems.
>> The other alternative would be to start with some kind of heuristic 
based planning or with other optimization algorithms (genetic algs, cross 
entropy,…).
>> 
>> Any thoughs or feedbacks are welcome!
>> 
>> Julian
> 
> 





Re: PLC4X Request Optimization

2019-08-19 Thread Julian Feinauer
Hi Danny,

thanks for the quick reply.
Cost calculation we can of course provide (but it could be a bit different as 
we have not only CPU and Memory but also Network or something).

And also something like the RelNodes could be provided. In our case this would 
be "Requests" which are at first "Logical" and are then transformed to 
"Physical" Requests. For example the API allows you to request many fields per 
single request but some PLCs only allow one field per request. So this would be 
one task of this layer.

Julian

Am 19.08.19, 14:44 schrieb "Danny Chan" :

Cool idea ! Julian Feinauer ~

I think the volcano model can be used the base of the cost algorithm. As 
long as you define all the metadata that you care about. Another thing is that 
you should have a struct like RelNode and a method like #computeSelfCost.

Best,
Danny Chan
在 2019年8月19日 +0800 PM5:20,Julian Feinauer ,写道:
> Hi folks,
>
> I’m here again with another PLC4X related question 
(https://plc4x.apache.org).
> As we have more and more usecases we encounter situations where we send 
LOTS of replies to PLCs which one could sometimes optimize.
> This has multiple reasons upstream (like multiple different Services 
sending, or you want two logically different addresses which could be 
physically equal).
>
> So, we consider to add some kind of optimizer which takes a Batch of 
requests and tries to arrange them in an “optimal” way with regard to som cost 
function.
> The cost functions would of course be given by each Driver but the 
optimizer could / should be rather general (possibly with pluggable rules).
>
> As Calcites Planner already includes all of that I ask myself if it could 
be possible (and make sense) to use that in PLC4X.
> Generally speaking, this raises the question if the Volcano approach can 
be suitable for such problems.
> The other alternative would be to start with some kind of heuristic based 
planning or with other optimization algorithms (genetic algs, cross entropy,…).
>
> Any thoughs or feedbacks are welcome!
>
> Julian




PLC4X Request Optimization

2019-08-19 Thread Julian Feinauer
Hi folks,

I’m here again with another PLC4X related question (https://plc4x.apache.org).
As we have more and more usecases we encounter situations where we send LOTS of 
replies to PLCs which one could sometimes optimize.
This has multiple reasons upstream (like multiple different Services sending, 
or you want two logically different addresses which could be physically equal).

So, we consider to add some kind of optimizer which takes a Batch of requests 
and tries to arrange them in an “optimal” way with regard to som cost function.
The cost functions would of course be given by each Driver but the optimizer 
could / should be rather general (possibly with pluggable rules).

As Calcites Planner already includes all of that I ask myself if it could be 
possible (and make sense) to use that in PLC4X.
Generally speaking, this raises the question if the Volcano approach can be 
suitable for such problems.
The other alternative would be to start with some kind of heuristic based 
planning or with other optimization algorithms (genetic algs, cross entropy,…).

Any thoughs or feedbacks are welcome!

Julian


Re: [PRIORITY] need your help with ApacheCon + project/PMC promotional opportunities

2019-08-07 Thread Julian Feinauer
I will be there and would love some : )

Julian

Am 07.08.19, 13:07 schrieb "Michael Mior" :

I won't be at ApacheCon, but should we request some stickers with our new 
logo?

--
Michael Mior
mm...@apache.org


-- Forwarded message -
De : Sally Khudairi 
Date: mar. 6 août 2019 à 23:38
Subject: [PRIORITY] need your help with ApacheCon + project/PMC
promotional opportunities
To: ASF Marketing & Publicity 


Hello ASF Members and Apache PMCs --I hope this note finds you all well.

I have an important/time-sensitive request, and am hoping you can help.

If you are interested in attending ApacheCon Las Vegas (9-12
September), please REGISTER ASAP (this week if possible: this will
help immeasurably with our last stage planning as the event is a month
a way). Committer discount code: CommittACNA19

For ApacheCon Berlin (22-24 October), the discount code is ACEU19_Committer

You can access both sites from https://www.apachecon.com/

NOTE: a) use your @apache.org address to register; b) the Vegas
discounted hotel room block *will sell out*, please secure your
sleeping rooms soon; c) discount codes are exclusive to APACHE
COMMITTERS ONLY = DO NOT SHARE/POST PUBLICLY!


The following OPPORTUNITIES are available for all --

Promote your projects:

1) get swag on RedBubble https://www.redbubble.com/people/comdev
--don't see your project listed? Find you logo at
http://apache.org/logos/ and request it be added by sending email to
d...@community.apache.org

2) get stickers --if you'd like us to have your project stickers at
the ASF Booth (or if you'd like some for a future event), please
request via the ComDev wiki
https://cwiki.apache.org/confluence/display/COMDEV/ApacheCon+NA+2019
or email d...@community.apache.org

3) #LoveApache badges --NEW! Show how much you/your
projects/communities/team/employers #LoveApache: here is your chance
to put your image/logo/face/* at the heart of it all to be used
online, in stickers, etc. Simply grab the template at
http://apache.org/foundation/press/kit/#badges and go. If you'd like
#LoveApache stickers for events (Apache Projects only!), follow the
instructions on "2)" above.


At ApacheCon Las Vegas:

1) Media & Analyst Training --the beginner course materials will soon
be available online for anyone to use at any time. In-person training
will now be offered at the Intermediate Level (pre-requisite:
completed Beginner level), and will be taking place Monday 9 September
9AM-12.30PM. Space is strictly limited: please contact me under
separate cover to sign up.

2) Education & Outreach: "Apache@" Events --we are offering a new
immersive session on how to host an "Apache@" event at your place of
employment. Originally offered as a benefit for ASF Sponsors, everyone
is invited to come learn best practices and an array of resources to
help corporate teams succeed when contributing to Apache projects.

Monday 9 September 2-5PM. Space is strictly limited: please contact me
under separate cover to sign up.

3) 20th Anniversary Photography Project --"Apache: Community Over
Code. Code Over Community" --lauded photographer/technologist Peter
Adams ("Faces of Open Source" http://www.facesofopensource.com/ ) will
be onsite in Las Vegas for a special project in recognition of the
ASF's 20th Anniversary.

This project has two components: 1) PMC group shots ("Community Over
Code"); and 2) 1-2 individual portraits of representatives of the
project --we will also need 100 lines of code (plaintext with
indentations and related mark up) that represents your project (or
community ethos): this will be projected onto the faces of the
individuals ("Code Over Community"), in a similar manner to
https://images.app.goo.gl/4wftAh3cN89JgJgE8

Space is limited: if interested, please contact me under separate
cover for more information.

4) 20th Anniversary Documentary: "Trillions and Trillions Served"
--ASF Member Michael Wechner and his crew will be back in Vegas and
Berlin to resume filming the documentary on the ASF that began 10
years ago. The trailer is at
http://www.wyonapictures.com/en/asf/a-brief-history-of-the-asf/index.html


Knowing who is attending which event will help us with casting,
interviews, and related activities, so please sign up soon!

Many thanks,
Sally

- - -
Vice President Marketing & Publicity
Vice President Sponsor Relations
The Apache Software Foundation

Tel +1 617 921 8656 | s...@apache.org




AW: [DISCUSS] Towards Calcite 1.21.0

2019-07-31 Thread Julian Feinauer
Hi Stamatis,

Thank you for the overview.
Currently Julian is working on a Pr of mine which does not exactly fix 
CALCITE-1935 but at least bring in the first support for MATCH_RECOGNIZE which 
I would consider at list a partial success.

But as the PR is rather big and the branch lived for two years Julian had to 
comment whether he thinks he can manage it.

Best
JulianF

Von meinem Mobiltelefon gesendet


 Ursprüngliche Nachricht 
Betreff: Re: [DISCUSS] Towards Calcite 1.21.0
Von: Chunwei Lei
An: dev@calcite.apache.org
Cc:

Thanks for your work, Stamatis!

Besides issues you mentioned above, I wonder if CALCITE-1581
 can be included in
1.21.0.


Best,
Chunwei


On Thu, Aug 1, 2019 at 6:29 AM Stamatis Zampetakis 
wrote:

> We are about three weeks before RC0 and we still have a big number of
> pending PRs.
> Moreover there are only a few Jira cases that are marked to be fixed in
> 1.21.0.
>
> If we assume that we have 10 active committers at the moment and each one
> of them takes on ~5 PRs till the 20th of August,
> we should have at least 50 Jiras marked to be resolved for the next
> version.
>
> I would like to kindly ask people to go through the PRs, select those that
> are going to make it for 1.21.0, and set the fix version accordingly.
>
> At the moment we have resolved 46 issues in Jira [1]. It would be great if
> we could bring this number to 50 by 7th of August.
>
> I've seen that Enrico started another thread about regressions on 1.20.0.
> Let's try to attack this issues first to allow people upgrade to the latest
> release.
>
> Among the issues that we would like to include in 1.21.0, I would like to
> highlight the following:
>
> https://issues.apache.org/jira/browse/CALCITE-2302
> https://issues.apache.org/jira/browse/CALCITE-3122
> https://issues.apache.org/jira/browse/CALCITE-3142
>
> Best,
> Stamatis
>
> [1]
> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
>
> On Mon, Jul 22, 2019 at 1:56 PM Chunwei Lei 
> wrote:
>
> > +1 for release at end of August.
> >
> > > Apart from very important issues it makes sense to treat PRs in FIFO
> > order.
> > Contributors who submit a PR early will certainly get discouraged to
> > contribute again if we never merge these PRs in time.
> >
> > +1 since it is very important for encouraging contributors.
> >
> >
> >
> > Best,
> > Chunwei
> >
> >
> > On Mon, Jul 22, 2019 at 9:19 AM Danny Chan  wrote:
> >
> > > >Apart from very important issues it makes sense to treat PRs in FIFO
> > > order.
> > > Contributors who submit a PR early will certainly get discouraged to
> > > contribute again if we never merge these PRs in time.
> > >
> > > There are 110+ PRs on the GitHub page, what should we do ?
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年7月22日 +0800 AM6:19,dev@calcite.apache.org,写道:
> > > >
> > > > Apart from very important issues it makes sense to treat PRs in FIFO
> > > order.
> > > > Contributors who submit a PR early will certainly get discouraged to
> > > > contribute again if we never merge these PRs in time.
> > >
> >
>


Request for Review: One step towards MATCH_RECOGNIZE

2019-07-30 Thread Julian Feinauer
Hi all,

I finally finished to bring the joint work on MATCH_RECOGNIZE to a state where 
at least two non-trivial Tests work, see [3].
The Work is based on a lot of preliminary work of Hongze and Julian (Hyde) 
which was done over a period of over a year, therefore the code is rather large.
I also decided to not squash this PR (yet) as most of the code is not from 
myself but from Hongze and Julian which would be lost, in case of a squash.
As I had some issues during the implementation and found, I think, some bugs in 
(yet) unused parts of the code I would be very grateful for support with 
reviewing this PR and bringing the code base to a state where it is merge-able 
into master.

Most of the discussions can be found in [1] and [2].

The tests that work can be found in JdbcTest:

  *   testSimpleMatch
  *   testMatch

The query that works now is:

```
select *
from "hr"."emps" match_recognize (
  order by "empid" desc
  measures "commission" as c,
"empid" as empid
  pattern (s up)
  define up as up."commission" < prev(up."commission"))
```
which covers all basic ingredients of the MATCH_RECOGNIZE clause.

The PR can be found in [4].

This PR does NOT yet completely resolve CALCITE-1935 as the given match.iq file 
does not yet work but I think it is a good idea to resync it with mainline and 
fix some flaws in my code before moving on.

As you may know I made only very few contributions to the Calcite codebase yet, 
so please forgive me if some of my approaches in the Code are rather unusual or 
bad design.

If there are any questions regarding my implementation please feel free to 
discuss.
I think this PR sets the first MWE for Match recognize and can be the basis for 
all the other (missing) features.

As this branch is pretty old was worked on by several people I do not know if 
all changes are reasonable, so it would be great if original authos (Julian, 
Hongze) could look into these diffs.
Short list of things reviewers should look into:

  *   RelBuilder – Don’t know about those changes?
  *   CircularArrayList – Is unused I think and at least the Tests had huge 
performance issues, could be removed, I think
  *   blank.iq – CoreQuidemTest fails and I have no idea why as I see no 
changes around. Probably that has to do with ExtensionSqlParser (?)
  *   I’m unsure about my changes in RexImpTable and would like to get comments 
about that
  *   Match.java:197 – I had to introduce this (ugly?) hack to make the tests 
in JdbcTest work. Perhaps someone could me help with that and explain why the 
former line fails?
  *   RexAction / RexPattern – I have no idea about those files and if they can 
be removed?

If there are any further questions please feel free to ask.

Best
Julian

[1] 
https://lists.apache.org/thread.html/f36852ea8fd49419da1492ee8c15a0cd0dd205f37000de0fe0032d32@%3Cdev.calcite.apache.org%3E
[2] 
https://lists.apache.org/thread.html/8aa11cb17a0a7c6ebc420f1a025e7521aa9d9cea56de1513b27b6c5b@%3Cdev.calcite.apache.org%3E
[3] https://issues.apache.org/jira/browse/CALCITE-1935
[4] https://github.com/apache/calcite/pull/1343



AW: Problem with Code Generation

2019-04-01 Thread Julian Feinauer
Hi, Yuzhou,

Thank you very much!
Please, tell me if you need any informations or snippets.

Julian



Von meinem Mobiltelefon gesendet


 Ursprüngliche Nachricht 
Betreff: Re: Problem with Code Generation
Von: Yuzhao Chen
An: dev@calcite.apache.org
Cc:

Julian Feinauer, i have fire a JIRA and prepare to fix it.

[1] https://issues.apache.org/jira/browse/CALCITE-2966

Best,
Danny Chan
在 2019年4月1日 +0800 AM12:45,dev@calcite.apache.org,写道:
>
> Hi all,
>
> I have some problems with the code generation from Linq4j which I'm unable to 
> resolve myself.
> Basically, I want to translate a condition from Rex to a Linq4j expression to 
> use it in generated code.
> In my example the Condition is from Match Recognize and in SQL is: 
> `up."commission" > prev(up."commission")`.
>
> ```
> RexBuilder rexBuilder = new RexBuilder(implementor.getTypeFactory());
> RexProgramBuilder rexProgramBuilder = new 
> RexProgramBuilder(physType.getRowType(), rexBuilder);
>
> rexProgramBuilder.addCondition(entry.getValue());
>
> final Expression condition = 
> RexToLixTranslator.translateCondition(rexProgramBuilder.getProgram(),
> (JavaTypeFactory) getCluster().getTypeFactory(),
> builder2,
> inputGetter1,
> implementor.allCorrelateVariables,
> implementor.getConformance());
>
> builder2.add(Expressions.return_(null, condition));
> ```
>
> Here, the condition seems okay, it is: ">(PREV(UP.$4, 0), PREV(UP.$4, 1))", 
> so it should be a comparison of two variables (I rewrite the PREV with a 
> custom Input Getter".
> But, the generated code (for Janino) is:
>
> ```
> Object p1 = row_.get($L4J$C$0_1);
> org.apache.calcite.test.JdbcTest.Employee p0 = 
> (org.apache.calcite.test.JdbcTest.Employee) p1;
> Object p3 = row_.get($L4J$C$1_1);
> org.apache.calcite.test.JdbcTest.Employee p2 = 
> (org.apache.calcite.test.JdbcTest.Employee) p3;
> Object p5 = row_.get($L4J$C$0_1);
> org.apache.calcite.test.JdbcTest.Employee p4 = 
> (org.apache.calcite.test.JdbcTest.Employee) p5;
> Object p7 = row_.get($L4J$C$1_1);
> org.apache.calcite.test.JdbcTest.Employee p6 = 
> (org.apache.calcite.test.JdbcTest.Employee) p7;
> return p0.commission && p2.commission && p4.commission > p6.commission;
> ```
>
> This confuses me a lot as I do not know where the check for p0.commission and 
> p2.commission comes from.
> It seems that Linq4j adds them as it expects these variables to be nullable, 
> but I have no idea on how to avoid this.
> These fields are Numeric so I always get a compilation exception.
>
> Can someone help me with this issue?


Problem with Code Generation

2019-03-31 Thread Julian Feinauer
Hi all,

I have some problems with the code generation from Linq4j which I'm unable to 
resolve myself.
Basically, I want to translate a condition from Rex to a Linq4j expression to 
use it in generated code.
In my example the Condition is from Match Recognize and in SQL is: 
`up."commission" > prev(up."commission")`.

```
RexBuilder rexBuilder = new RexBuilder(implementor.getTypeFactory());
RexProgramBuilder rexProgramBuilder = new 
RexProgramBuilder(physType.getRowType(), rexBuilder);

rexProgramBuilder.addCondition(entry.getValue());

final Expression condition = 
RexToLixTranslator.translateCondition(rexProgramBuilder.getProgram(),
  (JavaTypeFactory) getCluster().getTypeFactory(),
  builder2,
  inputGetter1,
  implementor.allCorrelateVariables,
  implementor.getConformance());


builder2.add(Expressions.return_(null, condition));
```

Here, the condition seems okay, it is: ">(PREV(UP.$4, 0), PREV(UP.$4, 1))",  so 
it should be a comparison of two variables (I rewrite the PREV with a custom 
Input Getter".
But, the generated code (for Janino) is:

```
Object p1 = row_.get($L4J$C$0_1);
org.apache.calcite.test.JdbcTest.Employee p0 = 
(org.apache.calcite.test.JdbcTest.Employee) p1;
Object p3 = row_.get($L4J$C$1_1);
org.apache.calcite.test.JdbcTest.Employee p2 = 
(org.apache.calcite.test.JdbcTest.Employee) p3;
Object p5 = row_.get($L4J$C$0_1);
org.apache.calcite.test.JdbcTest.Employee p4 = 
(org.apache.calcite.test.JdbcTest.Employee) p5;
Object p7 = row_.get($L4J$C$1_1);
org.apache.calcite.test.JdbcTest.Employee p6 = 
(org.apache.calcite.test.JdbcTest.Employee) p7;
return p0.commission && p2.commission && p4.commission > p6.commission;
```

This confuses me a lot as I do not know where the check for p0.commission and 
p2.commission comes from.
It seems that Linq4j adds them as it expects these variables to be nullable, 
but I have no idea on how to avoid this.
These fields are Numeric so I always get a compilation exception.

Can someone help me with this issue?

Thanks!
Julian



Re: [VOTE] Release apache-calcite-1.19.0 (release candidate 0)

2019-03-15 Thread Julian Feinauer
Hi Vladimir,

without beeing a super expert I guess the term "binary" in this context refers 
to a compiled package.
The Release Policy [1] states that for a (compiled) artifact the source code 
has to be released alongside.

So the images are basically the source for the homepage or documentation and 
the csv.gz files are necessary for testing (and thus building) the project.
The Maven Wrapper on the other hand is pre-compiled software but its source is 
not published with the release (as it has nothing to do with the software).

If I understand the policy correctly it would be okay to include the calcite 
jars in the release (as we do via maven central).
So I guess the term "binary" is a bit misleading.

Julian

[1] https://www.apache.org/legal/release-policy.html#compiled-packages

Am 15.03.19, 08:08 schrieb "Vladimir Sitnikov" :

Julian> maven-wrapper.jar is a binary file, and we cannot have binary
files in source distributions.

Hey, Julian, that is a great finding!
Can you please clarify if the following binary files count towards -1 as 
well?

site/img/pie-chart.png
site/img/cake.jpg
site/img/pb-calcite-140.png
site/img/pb-calcite-240.png
site/img/powered-by.png
site/img/window-types.png
site/img/logo.png
site/img/feather.png
site/fonts/fontawesome-webfont.ttf
site/fonts/fontawesome-webfont.woff
site/fonts/fontawesome-webfont.eot
file/src/test/resources/sales-csv/EMPS.csv.gz
example/csv/src/test/resources/sales/EMPS.csv.gz

Vladimir




AW: [DISCUSS] Move gitbox notification emails to another list?

2019-02-27 Thread Julian Feinauer
+1 I totally agree.

Julian


 Ursprüngliche Nachricht 
Betreff: Re: [DISCUSS] Move gitbox notification emails to another list?
Von: Vladimir Sitnikov
An: Apache Calcite dev list
Cc:

+1 to do something about it.
Diverting the messages to another list sounds good.

Vladimir


AW: Another Calcite-related paper accepted for SIGMOD -- "One SQL to Rule Them All"

2019-02-13 Thread Julian Feinauer
Congratulations... Would also like to see a preprint.

Von meinem Mobiltelefon gesendet

 Ursprüngliche Nachricht 
Betreff: Re: Another Calcite-related paper accepted for SIGMOD -- "One SQL to 
Rule Them All"
Von: Andrei Sereda
An: dev@calcite.apache.org
Cc:

Congratulations.
When the paper will be published ? I presume it is not accessible now ?

On Tue, Feb 12, 2019 at 7:08 PM AshwinKumar AshwinKumar <
aash...@g.clemson.edu> wrote:

> Congratulations
>
> On Tue, Feb 12, 2019 at 5:11 AM Edmon Begoli  wrote:
>
> > Dear Calcite community,
> >
> > I want to let you know that another significant paper featuring Calcite
> > (alongside Apache Flink and Beam) has been accepted for SIGMOD 2019.
> >
> > The full title of the paper is:
> > One SQL to Rule Them All – an Efficient and Syntactically Idiomatic
> > Approach to Management of Streams and Tables. Edmon Begoli, Tyler Akidau,
> > Fabian Hueske, Julian Hyde, Kathryn Knight, and Kenneth Knowles. To
> appear
> > in Proceedings of ACM SIGMOD conference (SIGMOD ’19). ACM, New York, NY,
> > USA
> >
> > I want to thank Julian Hyde for his contributions, and for introducing us
> > to the co-authors, with special thanks to Fabian Hueske from Flink, and
> > Tyler Akidau and Kenn Knowles from Beam for their outstanding
> > contributions.
> >
> > Thank you,
> > Edmon
> >
>


Re: [DISCUSS] Move site repositories from svn to gitbox

2019-02-11 Thread Julian Feinauer
We also use it for PLC4X and it works flawlessly.
+1

Julian

Am 11.02.19, 11:23 schrieb "Vova Vysotskyi" :

Great idea! Drill website already uses gitbox [1]

+1

[1] https://gitbox.apache.org/repos/asf?p=drill-site.git

Kind regards,
Volodymyr Vysotskyi


On Mon, Feb 11, 2019 at 12:01 PM Francis Chuang 
wrote:

> Hey all,
>
> ASF project sites have the ability to use git instead of subversion as
> their repository for web site content [1]. It has been available since
> 2015 and appears to be quite stable. Quite a few other projects have
> also moved their websites to git and subsequently, Gitbox (for using
> Github as their source of truth. As an example, see the Arrow project [2].
>
> I myself would love to see this as I find gits interface and ux to be
> much easier to use compared to svn. It also reduces the need to context
> switch between Git and svn when editing and pushing the site.
>
> My overall goal is to find a way to automate the publishing and build of
> our websites either via Jenkins builds (there are some projects are
> doing this already when I searched infra) or the new Github actions [3].
> Having the site hosted in Git would make this process much easier to
> automate. I will need to get in touch with infra to clarify a few things
> and to see if this is feasible, but I think this is a worthwhile endeavor.
>
> How do you guys feel about moving our site's repository from svn to 
GitBox?
>
> Francis
>
>
> [1] https://blogs.apache.org/infra/entry/git_based_websites_available
> [2] https://issues.apache.org/jira/browse/INFRA-17655
> [3] https://github.com/features/actions
>




Re: Release managers

2019-02-11 Thread Julian Feinauer
Hi Kevin,

this should make things easier definitely and speed up the process.

Julian

Am 11.02.19, 15:39 schrieb "Kevin Risden" :

I can volunteer to do the 1.19 2019-02 release. We are getting to midway
through February :)

Kevin Risden


On Thu, Feb 7, 2019 at 3:23 AM Julian Feinauer 

wrote:

> Hey all,
>
> I don't know if this is unusual or undoable due to permission issues as
> I'm no commiter.
> But I'd like to offer my duties as RM.
> I am not that familiar with Calcite releases but just finished my first
> official release for an Apache Projekt with PLC4X.
>
> Best
> Julian
>
>
> Am 06.02.19, 21:00 schrieb "Julian Hyde" :
>
> Any volunteers for 1.19?
>
> We now have
>
> Release Target date Release manager
> === === ===
> 1.192019-02
> 1.202019-04Michael Mior
> 1.212019-06Stamatis
> 1.222019-08   Andrei
> 1.232019-10
>
> > On Feb 4, 2019, at 10:17 AM, Michael Mior  wrote:
> >
> > Great idea. I was intending to volunteer as RM last time, but with
> the
> > time pressure, I didn't respond soon enough. I'm happy to take the
> > April release (1.20).
> >
> > --
> > Michael Mior
> > mm...@apache.org
> >
> > Le jeu. 31 janv. 2019 à 18:54, Andrei Sereda  a
> écrit :
> >>
> >> Release Target date Release manager
> >> === === ===
> >> 1.192019-02
> >> 1.202019-04
> >> 1.212019-06Stamatis
> >> 1.222019-08   Andrei
> >> 1.232019-10
> >>
> >> On Thu, Jan 31, 2019 at 6:14 PM Stamatis Zampetakis <
> zabe...@gmail.com>
> >> wrote:
> >>
> >>> Release Target date Release manager
> >>> === === ===
> >>> 1.192019-02
> >>> 1.202019-04
> >>> 1.212019-06Stamatis
> >>> 1.222019-08
> >>> 1.232019-10
> >>>
> >>> Στις Πέμ, 31 Ιαν 2019 στις 7:46 μ.μ., ο/η Julian Hyde <
> jh...@apache.org>
> >>> έγραψε:
> >>>
> >>>> Calcite needs to make regular releases, and we have established a
> cadence
> >>>> of every 2 - 3 months that everyone seems to like. But to keep
> that
> >>>> running, each release needs a release manager, and finding a
> release
> >>>> manager always seems to be a chore.
> >>>>
> >>>> I wonder if we have trouble recruiting release managers because
> we only
> >>>> ask for one at a time. How about we get volunteers for the next 5
> >>> releases?
> >>>> Then everyone will be seen to be doing their fair share.
> >>>>
> >>>> Release Target date Release manager
> >>>> === === ===
> >>>> 1.192019-02
> >>>> 1.202019-04
> >>>> 1.212019-06
> >>>> 1.222019-08
> >>>> 1.232019-10
> >>>>
> >>>> I propose that frequent committers (anyone who had 2 or more
> fixes in
> >>> 1.18
> >>>> and 1 or 2 fixes in 1.16 or 1.17) should all step up and be
> release
> >>> manager
> >>>> for one of the releases this year.
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>
>
>
>
>




Re: Release managers

2019-02-07 Thread Julian Feinauer
Hey all,

I don't know if this is unusual or undoable due to permission issues as I'm no 
commiter.
But I'd like to offer my duties as RM.
I am not that familiar with Calcite releases but just finished my first 
official release for an Apache Projekt with PLC4X.

Best
Julian


Am 06.02.19, 21:00 schrieb "Julian Hyde" :

Any volunteers for 1.19?

We now have

Release Target date Release manager
=== === ===
1.192019-02
1.202019-04Michael Mior
1.212019-06Stamatis
1.222019-08   Andrei
1.232019-10

> On Feb 4, 2019, at 10:17 AM, Michael Mior  wrote:
> 
> Great idea. I was intending to volunteer as RM last time, but with the
> time pressure, I didn't respond soon enough. I'm happy to take the
> April release (1.20).
> 
> --
> Michael Mior
> mm...@apache.org
> 
> Le jeu. 31 janv. 2019 à 18:54, Andrei Sereda  a écrit :
>> 
>> Release Target date Release manager
>> === === ===
>> 1.192019-02
>> 1.202019-04
>> 1.212019-06Stamatis
>> 1.222019-08   Andrei
>> 1.232019-10
>> 
>> On Thu, Jan 31, 2019 at 6:14 PM Stamatis Zampetakis 
>> wrote:
>> 
>>> Release Target date Release manager
>>> === === ===
>>> 1.192019-02
>>> 1.202019-04
>>> 1.212019-06Stamatis
>>> 1.222019-08
>>> 1.232019-10
>>> 
>>> Στις Πέμ, 31 Ιαν 2019 στις 7:46 μ.μ., ο/η Julian Hyde 
>>> έγραψε:
>>> 
 Calcite needs to make regular releases, and we have established a 
cadence
 of every 2 - 3 months that everyone seems to like. But to keep that
 running, each release needs a release manager, and finding a release
 manager always seems to be a chore.
 
 I wonder if we have trouble recruiting release managers because we only
 ask for one at a time. How about we get volunteers for the next 5
>>> releases?
 Then everyone will be seen to be doing their fair share.
 
 Release Target date Release manager
 === === ===
 1.192019-02
 1.202019-04
 1.212019-06
 1.222019-08
 1.232019-10
 
 I propose that frequent committers (anyone who had 2 or more fixes in
>>> 1.18
 and 1 or 2 fixes in 1.16 or 1.17) should all step up and be release
>>> manager
 for one of the releases this year.
 
 Julian
 
 
>>> 





Re: Current State of MATCH_RECOGNIZE Implementation

2019-01-09 Thread Julian Feinauer
Hi Julian,

the (simple) example is not yet running.
My main problem is how to implement the translation of a REX like “PREV(UP.$4, 
$5)” to an expression like “row.get($5).$4”.
Then, simple queries should work.

My main problem is that I do not know how to "swap" the PATTERN_INPUT_REF with 
the call to the PREV function during the RexToLix translation.
If you give me a hint on how to implement this, I'll try to do that.

JulianF

Am 09.01.19, 19:23 schrieb "Julian Hyde" :

I saw your PR https://github.com/julianhyde/calcite/pull/16 
<https://github.com/julianhyde/calcite/pull/16>. Can you please create a PR 
against Apache, then I’ll rebase it onto my 1935 branch.

Are we able to run a SQL query yet? My plan was to get a very basic query 
working end-to-end, then start adding features to the engine. I still think 
that’s a good plan.

Julian


> On Jan 9, 2019, at 9:13 AM, Julian Feinauer 
 wrote:
> 
> Hi all,
> 
> as discussed in earlier exchanges (see [1]), I started to work on 
implementing MATCH_RECOGNIZE based on Julian (Hydes) work [2].
> I think I made some progress and resolved some of the problems but I’m at 
a stage now where I’d need some advice from more seasoned Calcite dev’s on how 
to continue.
> 
> Basically, I improved the Matching (based on a DFA now) and ensured that 
we have the full path of symbols (not just rows) as base for the “Emitter”. The 
Matcher code should also be working now EXCEPT for the PREV / NEXT Commands.
> I think we could handle them pretty easily as I introduced a 
“MemoryEnumerable”, i.e., an Enumerable which keeps all records in a window 
around the current record (history and future). I think this should work here, 
as we have NO unbounded windows like for regular window functions.
> 
> So, basically the Matcher gets for each step a “Memory” Object which has 
a “get(n)” method to get the current (n = 0), present (n < 0) or future (n > 0) 
row.
> So, an expression of the Form “PREV(UP.$4, $5)” should be converted to 
something like “row.get($5).$4”.
> I have no real clue how to do this the “right” way, perhaps by a custom 
InputGetter which automatically introduces the “.get()”?
> When this is implemented the Matcher should be finished (?).
> 
> Another thing the current implementation is missing is the ordering 
inside the partitions (which is similar to window functions). Do you think we 
can simply reuse the code from there?
> Generally, MATCH_RECOGNIZE could be implemented as regular WINDOW 
Function in the situation where one output row is generated for each input row, 
but I think this does not help us much as there  also is the other “MODE” where 
it outputs a single row for each (possibly arbitrary long) match.
> 
> Parallel to this mail I submitted a PR to merge my branch back to Julians 
work to have a common “checkpoint” for the next steps.
> I would really value if someone could step in (Julian?) with either 
implement parts of the problems stated above or give me some hints on how to 
address this properly so that I can try to go further.
> 
> I don’t know what the usual way is but if it helps perhaps we can arrange 
a Screen sharing session or something to walk through the new code, if 
necessary.
> 
> Best
> JulianF
> 
> [1] 
https://lists.apache.org/thread.html/98f67c4534c32b544e48d54abca19f0e89fe8a163e5d5b822d80e6f0@%3Cdev.calcite.apache.org%3E
> [2] https://github.com/julianhyde/calcite/tree/1935-match-recognize





Current State of MATCH_RECOGNIZE Implementation

2019-01-09 Thread Julian Feinauer
Hi all,

as discussed in earlier exchanges (see [1]), I started to work on implementing 
MATCH_RECOGNIZE based on Julian (Hydes) work [2].
I think I made some progress and resolved some of the problems but I’m at a 
stage now where I’d need some advice from more seasoned Calcite dev’s on how to 
continue.

Basically, I improved the Matching (based on a DFA now) and ensured that we 
have the full path of symbols (not just rows) as base for the “Emitter”. The 
Matcher code should also be working now EXCEPT for the PREV / NEXT Commands.
I think we could handle them pretty easily as I introduced a 
“MemoryEnumerable”, i.e., an Enumerable which keeps all records in a window 
around the current record (history and future). I think this should work here, 
as we have NO unbounded windows like for regular window functions.

So, basically the Matcher gets for each step a “Memory” Object which has a 
“get(n)” method to get the current (n = 0), present (n < 0) or future (n > 0) 
row.
So, an expression of the Form “PREV(UP.$4, $5)” should be converted to 
something like “row.get($5).$4”.
I have no real clue how to do this the “right” way, perhaps by a custom 
InputGetter which automatically introduces the “.get()”?
When this is implemented the Matcher should be finished (?).

Another thing the current implementation is missing is the ordering inside the 
partitions (which is similar to window functions). Do you think we can simply 
reuse the code from there?
Generally, MATCH_RECOGNIZE could be implemented as regular WINDOW Function in 
the situation where one output row is generated for each input row, but I think 
this does not help us much as there  also is the other “MODE” where it outputs 
a single row for each (possibly arbitrary long) match.

Parallel to this mail I submitted a PR to merge my branch back to Julians work 
to have a common “checkpoint” for the next steps.
I would really value if someone could step in (Julian?) with either implement 
parts of the problems stated above or give me some hints on how to address this 
properly so that I can try to go further.

I don’t know what the usual way is but if it helps perhaps we can arrange a 
Screen sharing session or something to walk through the new code, if necessary.

Best
JulianF

[1] 
https://lists.apache.org/thread.html/98f67c4534c32b544e48d54abca19f0e89fe8a163e5d5b822d80e6f0@%3Cdev.calcite.apache.org%3E
[2] https://github.com/julianhyde/calcite/tree/1935-match-recognize


Re: Question Regarding The Benchmark of Calcite Compared To Conventional Database System(Related to CALCITE-2169)

2018-12-31 Thread Julian Feinauer
Hi Lekshmi,

your activity sounds very interesting.
One important thing to note is that Performance testing in Java is always 
tricky due to JIT and "warmup" phase of the JVM. Thus it is generally 
recommended to do these tests with JMH 
(https://openjdk.java.net/projects/code-tools/jmh/).

I would assume that the time for sql2rel reduces drastically (perhaps one or 
two orders) when run with JMH.

Best
Julian

Am 30.12.18, 23:12 schrieb "Lekshmi" :

Hello Folks,

For my research activities, I was trying to perform a benchmark comparison
between calcite with other database systems.  As an initial step, I was
trying to do it for *Calcite* and *PostgresSql*. So, I thought TPCH queries
were the right thing to start with. I tried running the TpchTest (

https://github.com/apache/calcite/blob/master/plus/src/test/java/org/apache/calcite/adapter/tpch/TpchTest.java)
by adding the *CalciteTimingTracer* in the junit tests to determine the
execution time. While doing so, I could see that the execution time in
calcite is significantly higher compared to postgresSql. On further
investigation, I could see that we generate the required datas required for
these queries(which comes around 150,000 for some tables) and I was under
an impression that most of the time was spend on the data generation and
that the query execution could be faster. So, I modified the relevant
schema class (

https://github.com/apache/calcite/blob/master/plus/src/main/java/org/apache/calcite/adapter/tpch/TpchSchema.java)
to perform the data generation and query execution separately. Then, I
traced the time took for just query execution. Even, then there was a
significant difference from that of PostgresSql.

I, also enabled the *log4j.rootLogger* to *TRACE * to find the time spend
for sql2rel and optimization phases of the class Prepare
<

https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/Prepare.java>.
And, to my surprise, I could see that calcite takes a time of 355ms for
sql2rel and 352ms for optimization for the junit test *testQuery01*. On the
other side, the same query gave a planning time of 0.163ms in Postgres.

I would like to know, if this is the right way to test the performance of
TPCH queries using apache calcite. Can anyone let me know if there exist
any better ways to do it.

And, while searching through JIRA, I could find a ticket
https://issues.apache.org/jira/browse/CALCITE-2169 which was created by
Edmon Begoli for performing a comparative performance study of the calcite
framework. I think, its related to my current problem. I have no idea
regarding the status of the ticket. It would be really great if someone
could help me with some information on it.

Also, now coming to the personal preference, I would like to continue my
research in calcite due to its simplicity and extensibility.  But, if I
fail to give a good case study in favour of Calcite, I am afraid that I
could loose an opportunity to work with calcite.

Thanks and Regards

Lekshmi B.G
Email: lekshmib...@gmail.com




Re: JSON(B) Support like in Postgres

2018-12-29 Thread Julian Feinauer
Hey Hongze,

thanks for your response.
I'm not sure if it would be the right way to overwrite the Built-In Functions 
in Calcite (or disable them).
I thought about translating them to different Postgres calls, i.e., implement a 
suitable Postgres Adapter that Maps Calcites Built-in functions to postgres 
native commands, if possible.

But I'll first check through your references and the Code a bit more.

Thanks
Julian

Am 29.12.18, 15:30 schrieb "Hongze Zhang" :

Hi Julian, 


If I remember right, Calcite does not support Postgres's json and jsonb 
datatype in current version (1.18).
Calcite has built-in JSON support (see CALCITE-2266[1]) similar to what has 
been implemented in Oracle and MS SQL, It is a earlier version of the whole 
JSON things described in the SQL standard. For now these functions[2] mainly 
accepts character datatypes as JSON input, other data types are not supported 
yet.


I am not so familiar with Postgres's JSON implementation but I think the 
implementation are wildly different with Calcite's, Some functions have 
conflict syntax with Calcite's function(E.g. JSON_VALUE).
If you'd like to process JSON using Postgres's syntax, maybe at first you 
need to change the Parser code of Calcite to support Postgres's json and jsonb 
Operators, and also disable the built-in JSON_VALUE function then add 
Postgres's JSON functions (if you want to use Postgres's JSON_VALUE function on 
Calcite 1.18).


Best,
Hongze


[1] https://issues.apache.org/jira/browse/CALCITE-2266
[2] http://calcite.apache.org/docs/reference.html#json-functions







At 2018-12-29 18:18:59, "Julian Feinauer"  
wrote:
>Hi all,
>
>we use Postgres a lot and make heavy use of the JSONB datatype [1].
>Is there support for something similar in Calcite?
>If so, can someone point me to the docs as I’ve not found anything in the 
list of builtin functions.
>
>There are several reasons why it would be cool for us to have Calcite in 
Front of postgres to do some query rewriting if necessary but for that we would 
definitely need support for something which could be transformed to JSONB.
>
>Best
>Julian
>
>[1] https://www.postgresql.org/docs/9.5/functions-json.html




JSON(B) Support like in Postgres

2018-12-29 Thread Julian Feinauer
Hi all,

we use Postgres a lot and make heavy use of the JSONB datatype [1].
Is there support for something similar in Calcite?
If so, can someone point me to the docs as I’ve not found anything in the list 
of builtin functions.

There are several reasons why it would be cool for us to have Calcite in Front 
of postgres to do some query rewriting if necessary but for that we would 
definitely need support for something which could be transformed to JSONB.

Best
Julian

[1] https://www.postgresql.org/docs/9.5/functions-json.html


Re: MATCH_RECOGNIZE

2018-12-28 Thread Julian Feinauer
Hi Julian,

I see your argument with the DFA. The reason I used it is get a (possibly 
inefficient) working solution and it helped me to reduce some code complexity, 
see the reduction of code in the Matcher.matchWithSymbol Method [1].
But, I think we can easily factor out the code for DFA creation and keep only 
the epsilon removal.
I can do this if you want.

To your other point. I did never intention to step on your feet with this and I 
really have to excuse for that. My only intention was to help get the 
MATCH_RECOGNIZE implementation working as offered in recent conversations. So I 
suggest that I stop working on this and let you go on. If you need some of my 
code I'll prepare the respective parts for a PR to your branch.

Let me know if there are other things I can do to help you with the 
MATCH_RECOGNIZE implementation until then I'll stop my efforts on this topic.

JulianF

[1] 
https://github.com/JulianFeinauer/calcite/blob/1dc707278f2e23e712c29b495a67084689cf825e/core/src/main/java/org/apache/calcite/runtime/Matcher.java

Am 29.12.18, 01:30 schrieb "Julian Hyde" :

I hesitated to make the automaton deterministic because the DFA can have 
exponentially more states than its corresponding NFA, and I was concerned that 
this worst case might occur in real-world queries. I could imagine a query with 
20 symbols and 1 million states, or 30 symbols and 1 billion states. Are you 
confident that the DFA will always have a reasonable size?

Whether or not we go to DFA, removing epsilon transitions does seem 
worthwhile. I think that would cause only a small increase in the number of 
states.

I’m going to stop working on this code, until you reach a stopping point. 
There’s no point in us treading on each other’s feet. It’s a shame, because I 
was looking forward to working on this code over the Xmas break.

Julian


> On Dec 28, 2018, at 3:02 PM, Julian Feinauer 
 wrote:
> 
> Hi Julian,
> 
> as it got really confusing for me with the eps-NFA (and all the 
concurrent partial matches in the matcher) I added a class DFA that transforms 
the epsilon-NFA from an automaton to a DFA and reimplemented the Matcher based 
on it.
> I think it is running now but it fails on tests that contain a "repeat" 
statement (X{a,b}) in AutomatonTest.
> The constructed DFA from these statements is wrong.
> Do you have a reference of the transformation in your Thompson 
construction? I found nothing on quick googling.
> I would like to check the implementation in the AutomatonBuilder:202 ff 
as I did not find a Bug in the Matcher or the DFA code.
> Otherwise I would try to check it by expanding the repeat with symbols, 
ors and optionals.
> 
> I also fixed the coding style (sorry for that, I totally missed that) so 
it should be better to review my code now (as of commit 
https://github.com/julianhyde/calcite/pull/15/commits/358ca1c5928b57cc96c8b39be8d017872d870dcf
 ).
> 
> Best
> JulianF
> 
> Am 28.12.18, 02:19 schrieb "Julian Hyde" :
> 
>I think we should get one example working end-to-end before moving to 
the next. (By “example” I mean a SQL query on a standard data set that 
exercises one new feature, say FINAL.) Right now nothing works end-to-end 
because we don’t have the basic code generation working.
> 
>I agree that assigning symbols to matched rows is a hard problem. I 
think the best approach is to first figure out whether there is a match (that’s 
what Automaton does) and then, in a second pass, assign a symbol to each row. 
The second pass might be significantly slower, but only occurs less often. 
Also, AFAICT, symbol assignment is only required for the CLASSIFIER() function. 
So I was going to defer that task.
> 
>Yes, I am actively working on this code. I see that your branch has 
significant changes because you use a different coding style (e.g. different 
indentation). Please change your code back to the existing style. There is no 
reason to make the task even more difficult than it already is.
> 
>Julian
> 
> 
>> On Dec 27, 2018, at 1:19 PM, Julian Feinauer 
 wrote:
>> 
>> Hi Julian,
>> 
>> regarding "^" and "$" it seems like Zhiqiang already introduced the 
fields strictStart and strictEnd in org.apache.calcite.rel.core.Match. But I 
agree with you and already had the same idea. And I'll go over to you last 
commit to start my branch off.
>> 
>> I made some progress in my branch [1]. I get it to compile and I get the 
test `JdbcTest.testMatch` to run and to fail (but no longer throw an exception, 
at least).
>> I fixed several things at several places and I think the code generation 
is now working (NOT working good) for Matcher and Emitt

Re: MATCH_RECOGNIZE

2018-12-28 Thread Julian Feinauer
Hi Julian,

as it got really confusing for me with the eps-NFA (and all the concurrent 
partial matches in the matcher) I added a class DFA that transforms the 
epsilon-NFA from an automaton to a DFA and reimplemented the Matcher based on 
it.
I think it is running now but it fails on tests that contain a "repeat" 
statement (X{a,b}) in AutomatonTest.
The constructed DFA from these statements is wrong.
Do you have a reference of the transformation in your Thompson construction? I 
found nothing on quick googling.
I would like to check the implementation in the AutomatonBuilder:202 ff as I 
did not find a Bug in the Matcher or the DFA code.
Otherwise I would try to check it by expanding the repeat with symbols, ors and 
optionals.

I also fixed the coding style (sorry for that, I totally missed that) so it 
should be better to review my code now (as of commit 
https://github.com/julianhyde/calcite/pull/15/commits/358ca1c5928b57cc96c8b39be8d017872d870dcf
 ).

Best
JulianF

Am 28.12.18, 02:19 schrieb "Julian Hyde" :

I think we should get one example working end-to-end before moving to the 
next. (By “example” I mean a SQL query on a standard data set that exercises 
one new feature, say FINAL.) Right now nothing works end-to-end because we 
don’t have the basic code generation working.

I agree that assigning symbols to matched rows is a hard problem. I think 
the best approach is to first figure out whether there is a match (that’s what 
Automaton does) and then, in a second pass, assign a symbol to each row. The 
second pass might be significantly slower, but only occurs less often. Also, 
AFAICT, symbol assignment is only required for the CLASSIFIER() function. So I 
was going to defer that task.

Yes, I am actively working on this code. I see that your branch has 
significant changes because you use a different coding style (e.g. different 
indentation). Please change your code back to the existing style. There is no 
reason to make the task even more difficult than it already is.

Julian


> On Dec 27, 2018, at 1:19 PM, Julian Feinauer 
 wrote:
> 
> Hi Julian,
> 
> regarding "^" and "$" it seems like Zhiqiang already introduced the 
fields strictStart and strictEnd in org.apache.calcite.rel.core.Match. But I 
agree with you and already had the same idea. And I'll go over to you last 
commit to start my branch off.
> 
> I made some progress in my branch [1]. I get it to compile and I get the 
test `JdbcTest.testMatch` to run and to fail (but no longer throw an exception, 
at least).
> I fixed several things at several places and I think the code generation 
is now working (NOT working good) for Matcher and Emitter.
> But there are “crucial” points where I’d like to have your advice (or 
someone else familiar with these topics):
> 
> First, I’m unsure how the FINAL function should be implemented (it’s no 
regular operator and I did not find any reference on how to deal with this) so 
I “shortcutet” it by a reference to the ABS function which is “noop” in the 
test case, see RexImplTable:385.
> 
> I also have no real idea about the implementation of PREV / LAST 
Operators. I think there are some similarities to Window Aggregates and the 
PRECEDING / FOLLOWING operators, like RexWindowBound.
> 
> But currently I started working on a refactoring of the Matcher. 
Currently it only returns the rows that matched but not the respective symbols 
the rows where matched to. They are necessary for the emitter. I'm unsure 
whether to keep it based on an NFA or it is easier with a DFA. 
> 
> Before I continue and dig more through the code base it would be good for 
me to have some kind of feedback whether I’m going in the right direction and 
the things I do are of any value or if I misunderstood or misinterpreted some 
parts.
> 
> JulianF
> 
> PS.: Are you actively working on the branch? We should synchronize to 
avoid duplicate work.
> 
> [1] https://github.com/JulianFeinauer/calcite/tree/1935-match-recognize
> 
> Am 27.12.18, 21:21 schrieb "Julian Hyde" :
> 
>I think you can implement “^” by adding a special BEGIN state to the 
automaton. Each automaton should be in this state on creation, and there is no 
inbound transition (i.e. no way to get back into this state).
> 
>And you can implement “$” by adding a special end-of-data symbol (you 
might as well call it “$”) that is sent to each partition’s automaton when the 
input ends.
> 
>These seem to be elegant solutions because most of the work is in 
Pattern and Automaton, and can be unit-tested in AutomatonTest. Just a little 
extra plumbing needs to be added to the runtime in order to use it.
> 
>As you have noticed my b

Re: MATCH_RECOGNIZE

2018-12-27 Thread Julian Feinauer
Hi Julian,

regarding "^" and "$" it seems like Zhiqiang already introduced the fields 
strictStart and strictEnd in org.apache.calcite.rel.core.Match. But I agree 
with you and already had the same idea. And I'll go over to you last commit to 
start my branch off.

I made some progress in my branch [1]. I get it to compile and I get the test 
`JdbcTest.testMatch` to run and to fail (but no longer throw an exception, at 
least).
I fixed several things at several places and I think the code generation is now 
working (NOT working good) for Matcher and Emitter.
But there are “crucial” points where I’d like to have your advice (or someone 
else familiar with these topics):
 
First, I’m unsure how the FINAL function should be implemented (it’s no regular 
operator and I did not find any reference on how to deal with this) so I 
“shortcutet” it by a reference to the ABS function which is “noop” in the test 
case, see RexImplTable:385.

I also have no real idea about the implementation of PREV / LAST Operators. I 
think there are some similarities to Window Aggregates and the PRECEDING / 
FOLLOWING operators, like RexWindowBound.

But currently I started working on a refactoring of the Matcher. Currently it 
only returns the rows that matched but not the respective symbols the rows 
where matched to. They are necessary for the emitter. I'm unsure whether to 
keep it based on an NFA or it is easier with a DFA. 

Before I continue and dig more through the code base it would be good for me to 
have some kind of feedback whether I’m going in the right direction and the 
things I do are of any value or if I misunderstood or misinterpreted some parts.

JulianF

PS.: Are you actively working on the branch? We should synchronize to avoid 
duplicate work.

[1] https://github.com/JulianFeinauer/calcite/tree/1935-match-recognize

Am 27.12.18, 21:21 schrieb "Julian Hyde" :

I think you can implement “^” by adding a special BEGIN state to the 
automaton. Each automaton should be in this state on creation, and there is no 
inbound transition (i.e. no way to get back into this state).

And you can implement “$” by adding a special end-of-data symbol (you might 
as well call it “$”) that is sent to each partition’s automaton when the input 
ends.

These seem to be elegant solutions because most of the work is in Pattern 
and Automaton, and can be unit-tested in AutomatonTest. Just a little extra 
plumbing needs to be added to the runtime in order to use it.

As you have noticed my branch 
https://github.com/julianhyde/calcite/tree/1935-match-recognize/ 
<https://github.com/julianhyde/calcite/tree/1935-match-recognize/> is broken as 
of the latest commit. Consider starting your branch from the previous commit 
https://github.com/julianhyde/calcite/commit/ea20e84c2d0cf636d2279d182be6df2ef65b67d7
 
<https://github.com/julianhyde/calcite/commit/ea20e84c2d0cf636d2279d182be6df2ef65b67d7>.
 We can sync up when my branch is working.

Julian
    

> On Dec 26, 2018, at 6:44 AM, Julian Feinauer 
 wrote:
> 
> Hi Julian,
> 
> I used [1] as reference. Anchors are explicitly stated as part of the 
syntax and explained as:
> 
>> Anchors work in terms of positions rather than rows. They match a 
position either at the start or end of a partition.
>>^ matches the position before the first row in the partition.
>>$ matches the position after the last row in the partition.
>> As an example, PATTERN (^A+$) will match only if all rows in a partition 
satisfy the condition for A. The resulting match spans the entire partition.
> 
> Regarding patterns, I think it should not be a big change, as the anchors 
are defined with respect to partition boundaries. So technically they do not 
have to see "beyond" boundaries but should simply "see" boundaries.
> So all we need should be an "outside partition" state which CAN be used 
as starting or ending state (basically symbols "^" and "$" should reference 
that).
> 
> I'll see if I find a solution based on your code... I'll do the work in 
my branch [2] based on your branch [3].
> 
> Best
> JulianF
> 
> [1] https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8956
> [2] https://github.com/JulianFeinauer/calcite/tree/1935-match-recognize
> [3] https://github.com/julianhyde/calcite/tree/1935-match-recognize
> 
> Am 26.12.18, 08:49 schrieb "Julian Hyde" :
> 
>You are correct that my 1935-match-recognize branch doesn’t compile 
(as of 1a552a9). I committed and pushed in the middle of a change because I had 
done a non-trivial rebase.
> 
>I haven’t missed a file; the two compilation errors were intended to 
remind me where to start work a

[jira] [Created] (CALCITE-2756) ForEachStatement generates invalid Java Code

2018-12-26 Thread Julian Feinauer (JIRA)
Julian Feinauer created CALCITE-2756:


 Summary: ForEachStatement generates invalid Java Code
 Key: CALCITE-2756
 URL: https://issues.apache.org/jira/browse/CALCITE-2756
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Feinauer
Assignee: Julian Feinauer


Code generated by the ForEachStatement in org.apache.calcite.linq4j.tree looks 
like:

{code:java}
 for (i : list) {
...
  }
{code}

I.e. the Paramter Type for the looping parameter is missing.
Thus, this cannot be used for codegen because Janino fails then.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: MATCH_RECOGNIZE

2018-12-26 Thread Julian Feinauer
Hi Julian,

I used [1] as reference. Anchors are explicitly stated as part of the syntax 
and explained as:

> Anchors work in terms of positions rather than rows. They match a position 
> either at the start or end of a partition.
> ^ matches the position before the first row in the partition.
> $ matches the position after the last row in the partition.
> As an example, PATTERN (^A+$) will match only if all rows in a partition 
> satisfy the condition for A. The resulting match spans the entire partition.

Regarding patterns, I think it should not be a big change, as the anchors are 
defined with respect to partition boundaries. So technically they do not have 
to see "beyond" boundaries but should simply "see" boundaries.
So all we need should be an "outside partition" state which CAN be used as 
starting or ending state (basically symbols "^" and "$" should reference that).

I'll see if I find a solution based on your code... I'll do the work in my 
branch [2] based on your branch [3].

Best
JulianF

[1] https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8956
[2] https://github.com/JulianFeinauer/calcite/tree/1935-match-recognize
[3] https://github.com/julianhyde/calcite/tree/1935-match-recognize

Am 26.12.18, 08:49 schrieb "Julian Hyde" :

You are correct that my 1935-match-recognize branch doesn’t compile (as of 
1a552a9). I committed and pushed in the middle of a change because I had done a 
non-trivial rebase.

I haven’t missed a file; the two compilation errors were intended to remind 
me where to start work again. I am working on generating code to emit rows, and 
to populate measures and predicates from the input row. If you can make 
progress on that, that would be awesome.

Are anchors (“^” and “$”) supported by Oracle? If so can you point me to 
the spec/examples. I am surprised that anything to do with patterns needs see 
beyond the boundaries of the current partition. I had assumed that each 
partition has its own state machine and it will be difficult to change that.

Julian


> On Dec 25, 2018, at 2:56 PM, Julian Feinauer 
 wrote:
> 
> Hey,
> 
> it's once again me, JulianF.
> I started work on the Automaton / Matcher and implemented OR and OPTIONAL 
("?") to get started with the code.
> I would highly appreciate if you (Julian H) could check this code (I made 
a PR to your branch).
> Then, what else did you consider as necessary for the implementation?
> I thought about anchors ("^", "$") but this would need a little bit of 
extra changes in the PartitionStates, as far as I see it (to check when we 
"enter" a partition and when we "leave".
> 
> Best
> JulianF
> 
> Am 25.12.18, 20:38 schrieb "Julian Feinauer" 
:
> 
>Hi Julian,
> 
>as I already declared my interest in MATCH_RECOGNIZE and offered my 
help, I plan to do some things in the next one or two weeks.
>Thus, I wanted to start based on your branch (“1935-match-recognize”).
> 
>I have some problems getting it to run.
>Is it possible that there are some files missing in the commit or are 
there some things to consider?
> 
>Thanks!
>Julian (F)
> 
>On 2018/11/26 20:09:00, Julian Hyde 
mailto:j...@apache.org>> wrote:
>> Over thanksgiving, I started working on MATCH_RECOGNIZE again. I wrote a 
standalone class called Automaton that allows you to build patterns (basically 
regular expressions, but sufficient for the PATTERN sub-clause of 
MATCH_RECOGNIZE), and execute them in a unit test.>
>> 
>> Would someone like to help me develop this? We have support for “*” 
(zero or more repeats), “+” (1 or more repeats) and “{m,n}” (bounded repeats) 
but need “|” (or) and several others. It should be fairly straightforward 
test-driven development: add tests to AutomatonTest.java [1], then change 
Automaton, AutomatonBuilder, Pattern or Matcher until they pass.>
>> 
>> We also need lots of SQL tests. Could someone write queries against 
Oracle’s “ticker” table and paste the queries and results into match.iq?>
>> 
>> See CALCITE-1935 [2], and my branch [3].>
>> 
>> I have cherry-picked commits from Zhiqiang He’s branch [4] into my 
branch, so this will be a joint effort when it is finished.>
>> 
>> Julian>
>> 
>> [1] 
https://github.com/julianhyde/calcite/blob/1935-match-recognize/core/src/test/java/org/apache/calcite/runtime/AutomatonTest.java
 
<https://github.com/julianhyde/calcite/blob/1935-match-recognize/core/src/test/java/org/apache/calcite/runtime/AutomatonTest.java><https://

Re: MATCH_RECOGNIZE

2018-12-25 Thread Julian Feinauer
Hi Julian,

as I already declared my interest in MATCH_RECOGNIZE and offered my help, I 
plan to do some things in the next one or two weeks.
Thus, I wanted to start based on your branch (“1935-match-recognize”).

I have some problems getting it to run.
Is it possible that there are some files missing in the commit or are there 
some things to consider?

Thanks!
Julian (F)

On 2018/11/26 20:09:00, Julian Hyde mailto:j...@apache.org>> 
wrote:
> Over thanksgiving, I started working on MATCH_RECOGNIZE again. I wrote a 
> standalone class called Automaton that allows you to build patterns 
> (basically regular expressions, but sufficient for the PATTERN sub-clause of 
> MATCH_RECOGNIZE), and execute them in a unit test.>
>
> Would someone like to help me develop this? We have support for “*” (zero or 
> more repeats), “+” (1 or more repeats) and “{m,n}” (bounded repeats) but need 
> “|” (or) and several others. It should be fairly straightforward test-driven 
> development: add tests to AutomatonTest.java [1], then change Automaton, 
> AutomatonBuilder, Pattern or Matcher until they pass.>
>
> We also need lots of SQL tests. Could someone write queries against Oracle’s 
> “ticker” table and paste the queries and results into match.iq?>
>
> See CALCITE-1935 [2], and my branch [3].>
>
> I have cherry-picked commits from Zhiqiang He’s branch [4] into my branch, so 
> this will be a joint effort when it is finished.>
>
> Julian>
>
> [1] 
> https://github.com/julianhyde/calcite/blob/1935-match-recognize/core/src/test/java/org/apache/calcite/runtime/AutomatonTest.java
>  
> <https://github.com/julianhyde/calcite/blob/1935-match-recognize/core/src/test/java/org/apache/calcite/runtime/AutomatonTest.java><https://github.com/julianhyde/calcite/blob/1935-match-recognize/core/src/test/java/org/apache/calcite/runtime/AutomatonTest.java%3e>>
>
> [2] https://issues.apache.org/jira/browse/CALCITE-1935 
> <https://issues.apache.org/jira/browse/CALCITE-1935><https://issues.apache.org/jira/browse/CALCITE-1935%3e>>
>
> [3] https://github.com/julianhyde/calcite/tree/1935-match-recognize/ 
> <https://github.com/julianhyde/calcite/tree/1935-match-recognize/><https://github.com/julianhyde/calcite/tree/1935-match-recognize/%3e>>
>
> [4] 
> https://github.com/Zhiqiang-He/calcite/tree/calcite-1935-MR-Implementation3 
> <https://github.com/Zhiqiang-He/calcite/tree/calcite-1935-MR-Implementation3><https://github.com/Zhiqiang-He/calcite/tree/calcite-1935-MR-Implementation3%3e>>
>
>
> > On Nov 21, 2018, at 8:45 AM, Julian Feinauer 
> > mailto:j@pragmaticminds.de>> wrote:>
> > >
> > Sorry, this is an old mail which got sent accidentally again by my mail 
> > program.>
> > Please ignore this and excuse this.>
> > >
> > Julian>
> > >
> > Am 21.11.18, 16:34 schrieb "Julian Feinauer" 
> > mailto:j@pragmaticminds.de>>:>
> > >
> >Hi Julian,>
> > >
> >I decided to reply to this (old) email, because here some facts are 
> > noted.>
> >Funnily, Apache Flink released their MATCH_RECOGNIZE Implementation 
> > yesterday.>
> > >
> >So I recall that you and Zhigiang He did something on this.>
> >I would like to have such a feature in Calcite (as stated in the other 
> > mail) and could try to go into this a bit with a colleague of mine and give 
> > a bit of support on this topic (In fact, it sounds like fun to us…).>
> >Perhaps theres also the chance to learn something from Flinks 
> > implementation, as you already had some contacts with them, I think?>
> > >
> >Best>
> >Julian>
> > >
> >On 2018/07/23 17:53:57, Julian Hyde 
> > mailto:j@apache.org>> wrote:>
> >> For quite a while we have had partial support for MATCH_RECOGNIZE. We 
> >> support it in the parser and validator, but there is no runtime 
> >> implementation. It’s a shame, because MATCH_RECOGNIZE is an incredibly 
> >> powerful SQL feature for both traditional SQL (it’s in Oracle 12c) and for 
> >> continuous query (aka complex event processing - CEP).>>
> >> >
> >> I figure it’s time to change that. My plan is to implement it 
> >> incrementally, getting simple queries working to start with, then allow 
> >> people to add more complex queries.>>
> >> >
> >> In a dev branch [1], I’ve added a method Enumerables.match[2]. The idea is 
> >> that if you supply an Enumerable of input data, a finite state machine to 
> >> figure out when a s

PLC4X Adapter for Calcite

2018-12-25 Thread Julian Feinauer
Hi all,

I am kind of cross posting this but I think it could be interesting for both 
communities, see my original post on the plc4x dev ML [1].
I just finished a first implementation of a PLC4X-Calcite Adapter.
What one can do with that is to create a Table (Scannable or Streamable) with 
values that are “scraped” regularly from PLCs.

Perhaps this helps a bit in the timeseries / signal processing discussions we 
have here.

If there are any questions regarding this, please feel free to ask.
And if this would be of interest for calcite we could also duplicate the code 
to calcite, I think.

Best
Julian

[1] 
https://lists.apache.org/thread.html/ea5837a2ee0ee88ffca678c553c892574e266264a0709920360fe781@%3Cdev.plc4x.apache.org%3E


Re: Relational algebra and signal processing

2018-12-18 Thread Julian Feinauer
Hi Ruhollah,

thanks for your mail.
Regarding your MATCH_RECOGNIZE question, I'm not sure whether this could work 
or not (I'm skeptic but it is a really powerful feature).

But to your other question, the thing you describe should be a perfect fit for 
what we usually do, yes.
In our situations we usually have pretty weak windows (only by time or during a 
condition is met).
But then, it is absolutely doable.

Regarding your suggestions for indices... this sounds very interesting, but I 
didn’t get it fully. Could you explain a bit more what you mean by a 
function-based index?
In our situation a proper index could be level sets [1].
A query for "Current is above xxx" could be optimized with such an index.

JulianF

[1] https://en.wikipedia.org/wiki/Level_set


Am 18.12.18, 19:43 schrieb "Ruhollah Farchtchi" :

Maybe this is a separate but related problem, however we see the same thing
with events in other use cases that are complex such as path analysis.
Let's say you are a cable provider and you want to identify channel
surfers. You define a channel surfer as any user that has flipped across 3
channels in a 5 minute window. Now you want to count the number of channel
surfers you had watching the Super Bowl. Can that be accomplished with
MATCH_RECOGNIZE? Some of this seems very similar to the use case Julian F
kicked this thread off with as it requires a transformation from time
series to event by way of pattern identification within a window of time.
Julian, you may need to FILL the window to achieve equal time increments so
the pattern match can be accomplished, but I'm not sure in this use case
you need to. Julian F is this use case similar? I would imagine you could
index the pattern matching part with a function-based index, which could be
implemented as some kind of secondary index via materialized views in
Calcite. Since it is on time series you could optimize maintenance of that
index as long as your window for pattern discovery was small enough.

Ruhollah Farchtchi
ruhollah.farcht...@gmail.com


On Tue, Dec 18, 2018 at 1:04 PM Julian Hyde  wrote:

> I think the difficulty with JulianF’s signal processing domain is that he
> needs there to be precisely one record at every clock tick (or more
> generally, at every point in an N-dimensional discrete space).
>
> Consider stock trading. A stock trade is an event that happens in
> continuous time, say
>
>   (9:58:02 ORCL 41), (10:01:55 ORCL 43)
>
> Our query wants to know the stock price at 10:00 (or at any 1-minute
> interval). Therefore we have to convert the event-oriented data into an
> array:
>
>   (9:59 ORCL 41), (10:00 ORCL 41), (10:01 ORCL 41), (10:02 ORCL 43).
>
> JulianF’s domain may be more naturally in the realm of array databases [1]
> but there are a lot of advantages to relational algebra and SQL, not least
> that we have reasonable story for streaming data, so let’s try to bridge
> the gap. Suppose we add a FILL operator that converts an event-based
> relation into a dense array:
>
>  SELECT *
>  FROM TABLE(FILL(Trades, ‘ROWTIME’, INTERVAL ‘1’ MINUTE))
>
> Now we can safely join with other data at the same granularity.
>
> Is this a step in the right direction?
>
> Julian
>
> [1] https://en.wikipedia.org/wiki/Array_DBMS
>
> > On Dec 18, 2018, at 7:05 AM, Michael Mior  wrote:
> >
> > I would say a similar theory applies. Some things are different when
> you're
> > dealing with streams. Mainly joins and aggregations. Semantics are
> > necessarily different whenever you have operations involving more than
> one
> > row at a time from the input stream. When dealing with a relation an
> > aggregation is straightforward since you just consume the entire input,
> and
> > output the result of the aggregation. Since streams don't end, you need
> to
> > decide how this is handled which usually amounts to a choice of 
windowing
> > algorithm. There are a few other things to think about. The presentation
> > linked below from Julian Hyde has a nice overview
    > >
    > > https://www.slideshare.net/julianhyde/streaming-sql-62376119
> >
> > --
> > Michael Mior
> > mm...@apache.org
> >
> >
> > Le mar. 18 déc. 2018 à 02:28, Julian Feinauer <
> j.feina...@pragmaticminds.de>
> > a écrit :
> >
> >> Hi Michael,
> >>
> >> yes, our workloads are usually in the context of streaming (but for
> replay
> >&g

Re: Relational algebra and signal processing

2018-12-18 Thread Julian Feinauer
Hey,

Julian (H) you are right with your assumptions. But, in our situation we do not 
necessarily need timestamps to be aligned on a regular grid but they have to be 
ordered (for processing at least).
I think stock prices are a very good example.

Three reasons why regular grids usually don’t work are
1. It's very hard to sample regular if the time resolution is high enough 
(jitter!)
2. Sometimes you want to reduce data by storing values only when they change
3. Sometimes it is not clear what the "minimal" time is, or how this should be 
chosen (the width of the grid)

But, nonetheless I read the link about the array databases (graphite is one 
also, I think) because I didn't know this description, thanks!

I see two ways for achieving the same thing.
First, we could try to make the time series a "proper relational problem" by 
using something like FILL.
The other option could be to do something like for Geo Data and use another 
Trait (it should be a trait in Calcite, or?) together with appropriate 
functions and stay in this Trait as long as necessary before we switch over to 
a "regular" relation (see above).

Does that make sense?

JulianF

Am 18.12.18, 19:04 schrieb "Julian Hyde" :

I think the difficulty with JulianF’s signal processing domain is that he 
needs there to be precisely one record at every clock tick (or more generally, 
at every point in an N-dimensional discrete space).

Consider stock trading. A stock trade is an event that happens in 
continuous time, say

  (9:58:02 ORCL 41), (10:01:55 ORCL 43)

Our query wants to know the stock price at 10:00 (or at any 1-minute 
interval). Therefore we have to convert the event-oriented data into an array:

  (9:59 ORCL 41), (10:00 ORCL 41), (10:01 ORCL 41), (10:02 ORCL 43).

JulianF’s domain may be more naturally in the realm of array databases [1] 
but there are a lot of advantages to relational algebra and SQL, not least that 
we have reasonable story for streaming data, so let’s try to bridge the gap. 
Suppose we add a FILL operator that converts an event-based relation into a 
dense array:

 SELECT *
 FROM TABLE(FILL(Trades, ‘ROWTIME’, INTERVAL ‘1’ MINUTE))

Now we can safely join with other data at the same granularity.

Is this a step in the right direction?

Julian

[1] https://en.wikipedia.org/wiki/Array_DBMS

> On Dec 18, 2018, at 7:05 AM, Michael Mior  wrote:
> 
> I would say a similar theory applies. Some things are different when 
you're
> dealing with streams. Mainly joins and aggregations. Semantics are
> necessarily different whenever you have operations involving more than one
> row at a time from the input stream. When dealing with a relation an
> aggregation is straightforward since you just consume the entire input, 
and
> output the result of the aggregation. Since streams don't end, you need to
> decide how this is handled which usually amounts to a choice of windowing
> algorithm. There are a few other things to think about. The presentation
> linked below from Julian Hyde has a nice overview
> 
> https://www.slideshare.net/julianhyde/streaming-sql-62376119
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> Le mar. 18 déc. 2018 à 02:28, Julian Feinauer 

> a écrit :
> 
>> Hi Michael,
>> 
>> yes, our workloads are usually in the context of streaming (but for 
replay
>> or so we also use batch).
>> But, if I understand it correctly, the same theory applies to both, 
tables
>> ("relations") and streaming tables, or?
>> I hope to find time soon to write a PLC4X - Calicte source which creates
>> one or many streams based on readings from a plc.
>> 
>> Julian
>> 
>> Am 18.12.18, 03:19 schrieb "Michael Mior" :
>> 
>>Perhaps you've thought of this already, but it sounds like streaming
>>relational algebra could be a good fit here.
>> 
>>https://calcite.apache.org/docs/stream.html
>>--
>>Michael Mior
>>mm...@apache.org
>> 
>> 
>>Le dim. 16 déc. 2018 à 18:39, Julian Feinauer <
>> j.feina...@pragmaticminds.de>
>>a écrit :
>> 
>>> Hi Calcite-devs,
>>> 
>>> I just had a very interesting mail exchange with Julian (Hyde) on the
>>> incubator list [1]. It was about our project CRUNCH (which is mostly
>> about
>>> time series analyses and signal processing) and its relation to
>> relational
>>> algebra and I wanted to bring the discussion t

Re: Relational algebra and signal processing

2018-12-17 Thread Julian Feinauer
Hi Michael,

yes, our workloads are usually in the context of streaming (but for replay or 
so we also use batch).
But, if I understand it correctly, the same theory applies to both, tables 
("relations") and streaming tables, or?
I hope to find time soon to write a PLC4X - Calicte source which creates one or 
many streams based on readings from a plc.

Julian

Am 18.12.18, 03:19 schrieb "Michael Mior" :

Perhaps you've thought of this already, but it sounds like streaming
relational algebra could be a good fit here.

https://calcite.apache.org/docs/stream.html
--
Michael Mior
mm...@apache.org


Le dim. 16 déc. 2018 à 18:39, Julian Feinauer 
a écrit :

> Hi Calcite-devs,
>
> I just had a very interesting mail exchange with Julian (Hyde) on the
> incubator list [1]. It was about our project CRUNCH (which is mostly about
> time series analyses and signal processing) and its relation to relational
> algebra and I wanted to bring the discussion to this list to continue 
here.
> We already had some discussion about how time series would work in calcite
> [2] and it’s closely related to MATCH_RECOGNIZE.
>
> But, I have a more general question in mind, to ask the experts here on
> the list.
> I ask myself if we can see the signal processing and analysis tasks as
> proper application of relational algebra.
> Disclaimer, I’m mathematician, so I know the formals of (relational)
> algebra pretty well but I’m lacking a lot of experience and knowledge in
> the database theory. Most of my knowledge there comes from Calcites source
> code and the book from Garcia-Molina and Ullman).
>
> So if we take, for example, a stream of signals from a sensor, then we can
> of course do filtering or smoothing on it and this can be seen as a 
mapping
> between the input relation and the output relation. But as we usually need
> more than just one tuple at a time we lose many of the advantages of the
> relational theory. And then, if we analyze the signal, we can again model
> it as a mapping between relations, but the input relation is a “time
> series” and the output relation consists of “events”, so these are in some
> way different dimensions. In this situation it becomes mostly obvious 
where
> the main differences between time series and relational algebra are. Think
> of something simple, an event should be registered, whenever the signal
> switches from FALSE to TRUE (so not for every TRUE). This could also be
> modelled with MATCH_RECOGNIZE pretty easily. But, for me it feels
> “unnatural” because we cannot use any indices (we don’t care about the
> ratio of TRUE and FALSE in the DB, except for probably some very rough
> outer bounds). And we are lacking the “right” information for the 
optimizer
> like estimations on the number of analysis results.
> It gets even more complicated when moving to continuous valued signals
> (INT, DOUBLE, …), e.g., temperature readings or something.
> If we want to analyze the number of times where we have a temperature
> change of more than 5 degrees in under 4 hours, this should also be doable
> with MATCH_RECOGNIZE but again, there is no index to help us and we have 
no
> information for the optimizer, so it feels very “black box” for the
> relational algebra.
>
> I’m not sure if you get my point, but for me, the elegance of relational
> algebra was always this optimization stuff, which comes from declarative
> and ends in an “optimal” physical plan. And I do not see how we can use
> much of this for the examples given above.
>
> Perhaps, one solution would be to do the same as for spatial queries (or
> the JSON / JSONB support in postgres, [3]) to add specialized indices,
> statistics and optimizer rules. Then, this would make it more “relational
> algebra”-esque in the sense that there really is a possibility to apply
> transformations to a given query.
>
> What do you think? Do I see things to complicated or am I missing
> something?
>
> Julian
>
> [1]
> 
https://lists.apache.org/thread.html/1d5a5aae1d4f5f5a966438a2850860420b674f98b0db7353e7b476f2@%3Cgeneral.incubator.apache.org%3E
> [2]
> 
https://lists.apache.org/thread.html/250575a56165851ab55351b90a26eaa30e84d5bbe2b31203daaaefb9@%3Cdev.calcite.apache.org%3E
> [3] https://www.postgresql.org/docs/9.4/datatype-json.html
>
>




Relational algebra and signal processing

2018-12-16 Thread Julian Feinauer
Hi Calcite-devs,

I just had a very interesting mail exchange with Julian (Hyde) on the incubator 
list [1]. It was about our project CRUNCH (which is mostly about time series 
analyses and signal processing) and its relation to relational algebra and I 
wanted to bring the discussion to this list to continue here.
We already had some discussion about how time series would work in calcite [2] 
and it’s closely related to MATCH_RECOGNIZE.

But, I have a more general question in mind, to ask the experts here on the 
list.
I ask myself if we can see the signal processing and analysis tasks as proper 
application of relational algebra.
Disclaimer, I’m mathematician, so I know the formals of (relational) algebra 
pretty well but I’m lacking a lot of experience and knowledge in the database 
theory. Most of my knowledge there comes from Calcites source code and the book 
from Garcia-Molina and Ullman).

So if we take, for example, a stream of signals from a sensor, then we can of 
course do filtering or smoothing on it and this can be seen as a mapping 
between the input relation and the output relation. But as we usually need more 
than just one tuple at a time we lose many of the advantages of the relational 
theory. And then, if we analyze the signal, we can again model it as a mapping 
between relations, but the input relation is a “time series” and the output 
relation consists of “events”, so these are in some way different dimensions. 
In this situation it becomes mostly obvious where the main differences between 
time series and relational algebra are. Think of something simple, an event 
should be registered, whenever the signal switches from FALSE to TRUE (so not 
for every TRUE). This could also be modelled with MATCH_RECOGNIZE pretty 
easily. But, for me it feels “unnatural” because we cannot use any indices (we 
don’t care about the ratio of TRUE and FALSE in the DB, except for probably 
some very rough outer bounds). And we are lacking the “right” information for 
the optimizer like estimations on the number of analysis results.
It gets even more complicated when moving to continuous valued signals (INT, 
DOUBLE, …), e.g., temperature readings or something.
If we want to analyze the number of times where we have a temperature change of 
more than 5 degrees in under 4 hours, this should also be doable with 
MATCH_RECOGNIZE but again, there is no index to help us and we have no 
information for the optimizer, so it feels very “black box” for the relational 
algebra.

I’m not sure if you get my point, but for me, the elegance of relational 
algebra was always this optimization stuff, which comes from declarative and 
ends in an “optimal” physical plan. And I do not see how we can use much of 
this for the examples given above.

Perhaps, one solution would be to do the same as for spatial queries (or the 
JSON / JSONB support in postgres, [3]) to add specialized indices, statistics 
and optimizer rules. Then, this would make it more “relational algebra”-esque 
in the sense that there really is a possibility to apply transformations to a 
given query.

What do you think? Do I see things to complicated or am I missing something?

Julian

[1] 
https://lists.apache.org/thread.html/1d5a5aae1d4f5f5a966438a2850860420b674f98b0db7353e7b476f2@%3Cgeneral.incubator.apache.org%3E
[2] 
https://lists.apache.org/thread.html/250575a56165851ab55351b90a26eaa30e84d5bbe2b31203daaaefb9@%3Cdev.calcite.apache.org%3E
[3] https://www.postgresql.org/docs/9.4/datatype-json.html



Re: MATCH_RECOGNIZE

2018-11-21 Thread Julian Feinauer
Sorry, this is an old mail which got sent accidentally again by my mail program.
Please ignore this and excuse this.

Julian

Am 21.11.18, 16:34 schrieb "Julian Feinauer" :

Hi Julian,

I decided to reply to this (old) email, because here some facts are noted.
Funnily, Apache Flink released their MATCH_RECOGNIZE Implementation 
yesterday.

So I recall that you and Zhigiang He did something on this.
I would like to have such a feature in Calcite (as stated in the other 
mail) and could try to go into this a bit with a colleague of mine and give a 
bit of support on this topic (In fact, it sounds like fun to us…).
Perhaps theres also the chance to learn something from Flinks 
implementation, as you already had some contacts with them, I think?

Best
Julian

On 2018/07/23 17:53:57, Julian Hyde  wrote:
> For quite a while we have had partial support for MATCH_RECOGNIZE. We 
support it in the parser and validator, but there is no runtime implementation. 
It’s a shame, because MATCH_RECOGNIZE is an incredibly powerful SQL feature for 
both traditional SQL (it’s in Oracle 12c) and for continuous query (aka complex 
event processing - CEP).>
> 
> I figure it’s time to change that. My plan is to implement it 
incrementally, getting simple queries working to start with, then allow people 
to add more complex queries.>
> 
> In a dev branch [1], I’ve added a method Enumerables.match[2]. The idea 
is that if you supply an Enumerable of input data, a finite state machine to 
figure out when a sequence of rows makes a match (represented by a transition 
function: (state, row) -> state), and a function to convert a matched set of 
rows to a set of output rows. The match method is fairly straightforward, and I 
almost have it finished.>
> 
> The complexity is in generating the finite state machine, emitter 
function, and so forth.>
> 
> Can someone help me with this task? If your idea of fun is implementing 
database algorithms, this is about as much fun as it gets. You learned about 
finite state machines in college - this is your chance to actually write one!>
> 
> This might be a good joint project with the Flink community. I know Flink 
are thinking of implementing CEP, and the algorithm we write here could be 
shared with Flink (for use via Flink SQL or via the Flink API).>
> 
> Julian>
> 
> [1] https://github.com/julianhyde/calcite/commits/1935-match-recognize 
<https://github.com/julianhyde/calcite/commits/1935-match-recognize>>
> 
> [2] 
https://github.com/julianhyde/calcite/commit/4dfaf1bbee718aa6694a8ce67d829c32d04c7e87#diff-8a97a64204db631471c563df7551f408R73
 
<https://github.com/julianhyde/calcite/commit/4dfaf1bbee718aa6694a8ce67d829c32d04c7e87#diff-8a97a64204db631471c563df7551f408R73>>




Re: MATCH_RECOGNIZE

2018-11-21 Thread Julian Feinauer
Hi Julian,

I decided to reply to this (old) email, because here some facts are noted.
Funnily, Apache Flink released their MATCH_RECOGNIZE Implementation yesterday.

So I recall that you and Zhigiang He did something on this.
I would like to have such a feature in Calcite (as stated in the other mail) 
and could try to go into this a bit with a colleague of mine and give a bit of 
support on this topic (In fact, it sounds like fun to us…).
Perhaps theres also the chance to learn something from Flinks implementation, 
as you already had some contacts with them, I think?

Best
Julian

On 2018/07/23 17:53:57, Julian Hyde  wrote:
> For quite a while we have had partial support for MATCH_RECOGNIZE. We support 
> it in the parser and validator, but there is no runtime implementation. It’s 
> a shame, because MATCH_RECOGNIZE is an incredibly powerful SQL feature for 
> both traditional SQL (it’s in Oracle 12c) and for continuous query (aka 
> complex event processing - CEP).>
> 
> I figure it’s time to change that. My plan is to implement it incrementally, 
> getting simple queries working to start with, then allow people to add more 
> complex queries.>
> 
> In a dev branch [1], I’ve added a method Enumerables.match[2]. The idea is 
> that if you supply an Enumerable of input data, a finite state machine to 
> figure out when a sequence of rows makes a match (represented by a transition 
> function: (state, row) -> state), and a function to convert a matched set of 
> rows to a set of output rows. The match method is fairly straightforward, and 
> I almost have it finished.>
> 
> The complexity is in generating the finite state machine, emitter function, 
> and so forth.>
> 
> Can someone help me with this task? If your idea of fun is implementing 
> database algorithms, this is about as much fun as it gets. You learned about 
> finite state machines in college - this is your chance to actually write one!>
> 
> This might be a good joint project with the Flink community. I know Flink are 
> thinking of implementing CEP, and the algorithm we write here could be shared 
> with Flink (for use via Flink SQL or via the Flink API).>
> 
> Julian>
> 
> [1] https://github.com/julianhyde/calcite/commits/1935-match-recognize 
> >
> 
> [2] 
> https://github.com/julianhyde/calcite/commit/4dfaf1bbee718aa6694a8ce67d829c32d04c7e87#diff-8a97a64204db631471c563df7551f408R73
>  
> >


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Pandas as a backend to SQL queries using Calcite as translator

2018-11-09 Thread Julian Feinauer
Hi,

if I understand your question correctly, you want to "transform" a SQL 
Statement into calls to pandas, is this right?
If so, you could implement a specific Call Convention, see [1].

By implementing your own Implementor you could then build together your pandas 
call.

Julian

[1] 
https://github.com/apache/calcite/blob/9721283bd0ce46a337f51a3691585cca8003e399/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableConvention.java

Am 09.11.18, 16:18 schrieb "M." :

Hi all,


Is it possible to use Pandas as a backend for computing the operations
parsed from SQL statements? I'm researching a way to parse SQL into Pandas
operations and maybe Calcite would be a way to do that.


Thanks




Re: How to modify select identifiers

2018-10-31 Thread Julian Feinauer
Please have a look in the SqlDialect class:

  /**
   * A dialect useful for generating SQL which can be parsed by the
   * Calcite parser, in particular quoting literals and identifiers. If you
   * want a dialect that knows the full capabilities of the database, create
   * one from a connection.
   */
  public static final SqlDialect CALCITE =
  DatabaseProduct.CALCITE.getDialect();

Just select the suitable Dialect from the DatabaseProduct enum.

Julian

Am 31.10.18, 12:28 schrieb "Shashwat Kumar" :

Hi Julian,

I have been able to successfully implement your above suggestions and got
properly modifed SqlNode in the end.
Now I want to convert SqlNode to sqlquery which I want to pass to JDBC
connection. Could you please help in that?

sqlNode.toString() is not giving proper query.
sqlNode.toSqlString(SqlDialect dialect) seems to be appropriate solution
but what should I pass as dialect?

On Wed, Oct 31, 2018 at 4:05 PM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

> Hi Shashwat,
>
> the implementation to use should be SqlBasicCall.
> And to achieve what you want to do I would use
>
> SqlParserPos pos = yourNode.getParserPosition();
> SqlNode aliased = new SqlBasicCall(SqlStdOperatorTable.AS, new
> SqlNode[]{yourNode, new SqlIdentifier(Collections.singletonList("v", pos),
> pos)}
>
> Note, I did not check above lines so perhaps you have to modify it a bit
> or play around, but this should be the general direction, I think.
>
> Julian
>
> Am 31.10.18, 11:28 schrieb "Shashwat Kumar" :
>
> Hi Julian,
>
> Thank you for quick response.
> SqlCall is abstract class so I am not able to find which concrete
> subclass
> of it I should instantiate. Could you please give some more hint or
> code
> snippet to do it? Also how to modify the identifier name. Say I want 
to
> change value to _MAP['value'] e.g.
    >
> SELECT _MAP['value'] as v FROM Data
>
> On Wed, Oct 31, 2018 at 3:42 PM Julian Feinauer <
> j.feina...@pragmaticminds.de> wrote:
>
> > Hi Shashwat,
> >
> > Calcite does this by a Call to the "AS" Operator (basically value as
> v is
> > just syntactic sugar for AS(value, v)).
> > So you need to create a call node (SqlCall) with the AS Operator
> > (SqlStdOperatorTable.AS) and as operands you node and an
> SqlIdentifier for
> > the Alias.
> >
> > But your visitor should then return SqlNode not String, or?
> >
> I'll will change it to SqlNode.
>
> >
> > Best
> > Julian
> >
> > Am 31.10.18, 11:07 schrieb "Shashwat Kumar" <
> shashwatkmr@gmail.com>:
> >
> > I want to modify select identifiers in sql statement. For 
example
> > SELECT value FROM Data
> > to
> > SELECT value as v FROM Data
> >
> > I am able to get SqlNode for select identifiers as follows.
> >
> > public String visit(SqlCall sqlCall) {
> >
> > SqlNodeList selectList = ((SqlSelect)
> sqlCall).getSelectList();
> > for (SqlNode sqlNode : selectList) {
> > *// do something with sqlNode*
> > }
> >
> > }
> >
> > Now how to change sqlNode as per requirement?
> >
> >
> > --
> > Regards
> > Shashwat Kumar
> >
> >
> >
>
> --
> Regards
> Shashwat Kumar
>
>
>

-- 
Regards
Shashwat Kumar




Re: How to modify select identifiers

2018-10-31 Thread Julian Feinauer
Hi Shashwat,

the implementation to use should be SqlBasicCall.
And to achieve what you want to do I would use

SqlParserPos pos = yourNode.getParserPosition();
SqlNode aliased = new SqlBasicCall(SqlStdOperatorTable.AS, new 
SqlNode[]{yourNode, new SqlIdentifier(Collections.singletonList("v", pos), pos)}

Note, I did not check above lines so perhaps you have to modify it a bit or 
play around, but this should be the general direction, I think.

Julian

Am 31.10.18, 11:28 schrieb "Shashwat Kumar" :

Hi Julian,

Thank you for quick response.
SqlCall is abstract class so I am not able to find which concrete subclass
of it I should instantiate. Could you please give some more hint or code
snippet to do it? Also how to modify the identifier name. Say I want to
change value to _MAP['value'] e.g.

SELECT _MAP['value'] as v FROM Data

On Wed, Oct 31, 2018 at 3:42 PM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

> Hi Shashwat,
>
> Calcite does this by a Call to the "AS" Operator (basically value as v is
> just syntactic sugar for AS(value, v)).
> So you need to create a call node (SqlCall) with the AS Operator
> (SqlStdOperatorTable.AS) and as operands you node and an SqlIdentifier for
> the Alias.
>
> But your visitor should then return SqlNode not String, or?
>
I'll will change it to SqlNode.

>
> Best
> Julian
>
> Am 31.10.18, 11:07 schrieb "Shashwat Kumar" :
>
> I want to modify select identifiers in sql statement. For example
> SELECT value FROM Data
> to
> SELECT value as v FROM Data
>
> I am able to get SqlNode for select identifiers as follows.
>
> public String visit(SqlCall sqlCall) {
>
> SqlNodeList selectList = ((SqlSelect) 
sqlCall).getSelectList();
> for (SqlNode sqlNode : selectList) {
> *// do something with sqlNode*
> }
>
> }
>
> Now how to change sqlNode as per requirement?
>
>
> --
> Regards
> Shashwat Kumar
>
>
>

-- 
Regards
Shashwat Kumar




Re: How to modify select identifiers

2018-10-31 Thread Julian Feinauer
Hi Shashwat,

Calcite does this by a Call to the "AS" Operator (basically value as v is just 
syntactic sugar for AS(value, v)).
So you need to create a call node (SqlCall) with the AS Operator 
(SqlStdOperatorTable.AS) and as operands you node and an SqlIdentifier for the 
Alias.

But your visitor should then return SqlNode not String, or?

Best
Julian

Am 31.10.18, 11:07 schrieb "Shashwat Kumar" :

I want to modify select identifiers in sql statement. For example
SELECT value FROM Data
to
SELECT value as v FROM Data

I am able to get SqlNode for select identifiers as follows.

public String visit(SqlCall sqlCall) {

SqlNodeList selectList = ((SqlSelect) sqlCall).getSelectList();
for (SqlNode sqlNode : selectList) {
*// do something with sqlNode*
}

}

Now how to change sqlNode as per requirement?


-- 
Regards
Shashwat Kumar




Re: Calcite on Traces / Industry 4.0 data

2018-10-31 Thread Julian Feinauer
Hey,

thanks for your reply and your ideas and reading your answer it becomes quite 
clear that we have some kind of "duality" between the SQL and the "signal" 
world. And indeed, your approach would allow us to close the bridge in one 
direction (signal to SQL).

For many use cases (Temperature Sensor with values when the value changes, or 
at least each minute) this is a good solution and, as you state, would make the 
application of MATCH_RECOGINZE more easy (and also other characteristics, like 
AVG).  And also this would allow Joins with other tables to enrich the data 
there. Regarding the Implementation, this has to be done as a Table Function, 
or?

But (at least in our use cases) there are many situations where this approach 
would probably not be sufficient or feasible. Because, often times sampling 
rates are pretty low and given with ms precision, this means that we would have 
to "expand" to one point / ms which is very inefficient (all sensors have their 
own timestamps so we really have to go to the lowest "tick"). Also to your note 
on the Join, I agree that the "natural join" for this problem is not a SQL Join 
and that your "expansion" would solve that but think about a Situation where 
you have one sensor sending temperature values each minute and one sensor 
sending current values each ms... I don’t know If this can be handled 
efficiently by the engine as it is pretty easy do some things wrong.
Thus, my idea was to have a separate join (perhaps this could be realized using 
a different Trait) which does the sample and hold implicitly and uses 
"backpressure" to fetch only values from the stream that is behind (we already 
use these algorithms in our framework).

I like the discussion as we think about these problems for more than 2 years 
now and developed our very own approach and it is a good point in time now to 
reflect and to see how we can join this with well-established solutions.

So except for your approach (to transform the problem to a valid SQL Problem) 
do you see the possibility to extend Calcite (I'm not speaking about the parser 
now), e.g. by using a different "Timeseries trait" which can handle the problem 
more optimized and only do things like the expansion you propose "in the end".

Best
Julian


Am 29.10.18, 20:40 schrieb "Julian Hyde" :

I’ve been thinking a bit more about this use case.

It’s tricky because signal processing and SQL have a different model of 
“event”. EE folks talk about “edge triggering” and “level triggering”, which 
are really just two different data models for the same physical phenomenon.

Consider an example of a step function. Suppose a signal is 0 until time 
3.25 when it becomes 1. SQL would probably represent this as an event

  (timestamp=0, value=0)

and another event

  (timestamp=3.25, value=1)

But to use MATCH_RECOGNIZE effectively, you would need a stream of events

  {(0, 0), (1, 0), (2, 0), (3, 0), (4, 1), (5, 1), …}

And even then, the stream does not capture exactly when the transition 
occurs, just that it happens sometime between 3 and 4.

We could provide a transform in SQL that converts the edge events

  {(0, 0), (3.25, 1)}

into level events on clock ticks

  {(0, 0), (1, 0), (2, 0), (3, 0), (4, 1), …}

We could also provide a transform from an event stream that has time gaps; 
e.g. 

  {(0, 0), (1, 0), (6, 1)}

becomes

  {(0, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 1), (7, 1)}

These transforms produce virtual data streams in a form that is more easy 
to write SQL on. For example, joins are easier if you can guarantee that there 
is a value for every clock tick. So are windowed aggregates. They also bring 
streams, constant tables and time-varying tables under the same roof.

Do you think this approach is worthwhile? Are there other data models in 
signal processing/event processing that we could bring into the relational 
model by applying transforms?

Julian





> On Oct 25, 2018, at 1:58 PM, Julian Feinauer 
 wrote:
> 
>  I just noted that I forgot to comment on Flinks Implementation, 
sorry.
> 
> I went through the patch which implemented basic functionality in the 
master[1] and I think that we cannot learn much from their approach directly as 
they reduce it to a CEP Pattern which is then forwarded to CEP where most of 
the magic happens.
> Thus, what they implemented now to make this feature work is, from my 
impression, on the level whats already implemented with the parsing and the 
LogicalMatch.
> 
> Sorry for the two emails
> Julian
> 
    > [1] 
https://github.com/apache/flink/commit/3acd92b45c21e081f781affc8cb5700d972f9b0b
> 
>

Re: Calcite on Traces / Industry 4.0 data

2018-10-25 Thread Julian Feinauer
 I just noted that I forgot to comment on Flinks Implementation, sorry.

I went through the patch which implemented basic functionality in the master[1] 
and I think that we cannot learn much from their approach directly as they 
reduce it to a CEP Pattern which is then forwarded to CEP where most of the 
magic happens.
Thus, what they implemented now to make this feature work is, from my 
impression, on the level whats already implemented with the parsing and the 
LogicalMatch.

Sorry for the two emails
Julian

[1] 
https://github.com/apache/flink/commit/3acd92b45c21e081f781affc8cb5700d972f9b0b

Am 25.10.18, 22:46 schrieb "Julian Feinauer" :

Hi Julian,

I filed a Jira form y general suggestion about "Timeseries SQL" 
(CALCITE-2640).
For the discussion in the other thread, I had a look into the present state 
of the code (from you and Zhiqiang He) for parsing and the logical node.

I also thought about the necessary implementation for the EnumerableMatch.
I'm pretty familiar with the regex to NFA / DFS part (from our 
implementations) and the define part.
But what I'm pretty unfamiliar with is the order and partition part (and 
especially how its implemented in Calcite).
Do you see any possibility to transform the Matching Part into a Window 
Aggregation function, or do I make things overly easy with this thought?

Wouldn’t this also make it easier with regards to the PREV, NEXT, FIRST, 
LAST window agg functions?
I can try to help with the implementation of the "inner" parts but I don’t 
feel that I'm familiar enough with the codebase to make the whole thing work.

Thus, if anybody of the seasoned Calcite devs could offer some help I would 
be happy to discuss details of the implementation and support the 
implementation as good as possible.

Best
Julian


    Am 23.10.18, 07:57 schrieb "Julian Feinauer" :

Hi Julian,

first of thanks for your reply and your thoughts.
Thinking about your arguments, I fully agree to what you say and we 
should really consider using MATCH_REGOCNIZE first and see where it gets us.

To our second "problem", the different channel groups (with unequal 
time stamps), we also need a sound mapping to SQL then. My first thought was to 
use the "drill approach" and to simply simulate a table which has all columns 
somebody wants (as we do not know that upfront) and return NULL or NaN values 
when the channel is not present at evaluation time (and do all the 
interpolation and stuff in the background). Or does anybody have a better idea?

For your suggested approach I agree and will try to write some of our 
analysis (in our Java DSL) with MATCH_RECOGNICE to see how well it fits and 
come back then to the list.

Thanks
Julian

Am 23.10.18, 05:55 schrieb "Julian Hyde" :

Julian,

Thanks for posting this to Calcite. We appreciate the opportunity 
to mull over a language and prevent a mis-guided SQL-like language.

I agree with both you and Mark: MATCH_RECOGNIZE seems to be very 
well suited to your problem domain. And MATCH_RECOGNIZE is non-trivial and 
difficult to learn.

But in its favor, MATCH_RECOGNIZE is in the SQL standard and has 
reference implementations in systems like Oracle, so we can assume that it is 
well-specified. And, in my opinion, it is well designed - it delivers 
significant extra power to SQL that could not be done efficiently or at all 
without it, and is consistent with existing SQL semantics. Lastly, the 
streaming systems such as Flink and Beam are adopting it.

When your proposed language has gone through the same process, I 
suspect that it would end up being very similar to MATCH_RECOGNIZE. 
MATCH_RECOGNIZE may seem “imperative” because it it is creating a 
state-transition engine, but finite-state automata can be reasoned and safely 
transformed, and are therefore to all intents and purposes “declarative”.

The biggest reason not to use MATCH_RECOGNIZE is your audience. 
There’s no point creating the perfect language if the audience doesn’t like it 
and want to adopt it. So perhaps your best path is to design your own language, 
find some examples and code them up as use cases in that language, and iterate 
based on your users’ feedback.

If I were you, I would also code each of those examples in SQL 
using MATCH_RECOGNIZE, and make sure that there is a sound mapping between 
those languages. And maybe your language could be implemented as a thin layer 
above MATCH_RECOGNIZE.

This is the same advice I would give to everyone who is writing a 
database: I don’t care whether you use SQL, but make sure your langu

Re: Calcite on Traces / Industry 4.0 data

2018-10-25 Thread Julian Feinauer
Hi Julian,

I filed a Jira form y general suggestion about "Timeseries SQL" (CALCITE-2640).
For the discussion in the other thread, I had a look into the present state of 
the code (from you and Zhiqiang He) for parsing and the logical node.

I also thought about the necessary implementation for the EnumerableMatch.
I'm pretty familiar with the regex to NFA / DFS part (from our implementations) 
and the define part.
But what I'm pretty unfamiliar with is the order and partition part (and 
especially how its implemented in Calcite).
Do you see any possibility to transform the Matching Part into a Window 
Aggregation function, or do I make things overly easy with this thought?

Wouldn’t this also make it easier with regards to the PREV, NEXT, FIRST, LAST 
window agg functions?
I can try to help with the implementation of the "inner" parts but I don’t feel 
that I'm familiar enough with the codebase to make the whole thing work.

Thus, if anybody of the seasoned Calcite devs could offer some help I would be 
happy to discuss details of the implementation and support the implementation 
as good as possible.

Best
Julian


Am 23.10.18, 07:57 schrieb "Julian Feinauer" :

Hi Julian,

first of thanks for your reply and your thoughts.
Thinking about your arguments, I fully agree to what you say and we should 
really consider using MATCH_REGOCNIZE first and see where it gets us.

To our second "problem", the different channel groups (with unequal time 
stamps), we also need a sound mapping to SQL then. My first thought was to use 
the "drill approach" and to simply simulate a table which has all columns 
somebody wants (as we do not know that upfront) and return NULL or NaN values 
when the channel is not present at evaluation time (and do all the 
interpolation and stuff in the background). Or does anybody have a better idea?

For your suggested approach I agree and will try to write some of our 
analysis (in our Java DSL) with MATCH_RECOGNICE to see how well it fits and 
come back then to the list.

Thanks
Julian

Am 23.10.18, 05:55 schrieb "Julian Hyde" :

Julian,

Thanks for posting this to Calcite. We appreciate the opportunity to 
mull over a language and prevent a mis-guided SQL-like language.

I agree with both you and Mark: MATCH_RECOGNIZE seems to be very well 
suited to your problem domain. And MATCH_RECOGNIZE is non-trivial and difficult 
to learn.

But in its favor, MATCH_RECOGNIZE is in the SQL standard and has 
reference implementations in systems like Oracle, so we can assume that it is 
well-specified. And, in my opinion, it is well designed - it delivers 
significant extra power to SQL that could not be done efficiently or at all 
without it, and is consistent with existing SQL semantics. Lastly, the 
streaming systems such as Flink and Beam are adopting it.

When your proposed language has gone through the same process, I 
suspect that it would end up being very similar to MATCH_RECOGNIZE. 
MATCH_RECOGNIZE may seem “imperative” because it it is creating a 
state-transition engine, but finite-state automata can be reasoned and safely 
transformed, and are therefore to all intents and purposes “declarative”.

The biggest reason not to use MATCH_RECOGNIZE is your audience. There’s 
no point creating the perfect language if the audience doesn’t like it and want 
to adopt it. So perhaps your best path is to design your own language, find 
some examples and code them up as use cases in that language, and iterate based 
on your users’ feedback.

If I were you, I would also code each of those examples in SQL using 
MATCH_RECOGNIZE, and make sure that there is a sound mapping between those 
languages. And maybe your language could be implemented as a thin layer above 
MATCH_RECOGNIZE.

This is the same advice I would give to everyone who is writing a 
database: I don’t care whether you use SQL, but make sure your language maps 
onto (extended) relational algebra. (And if you create a SQL-like language that 
breaks some of the concepts of SQL, such automatically joining tables, please 
don’t tell people that your language is SQL.)

I’m sorry to say that Calcite’s implementation of MATCH_RECOGNIZE has 
not moved forward much since my email. Maybe your effort is the kick necessary 
to get it going. I can assure you that I still believe that MATCH_RECOGNIZE, 
and the algebra that underlies it, is a solid foundation.

Julian
    
    

> On Oct 21, 2018, at 10:04 PM, Julian Feinauer 
 wrote:
> 
> Hi Mark,
> 
> thanks for your reply.
> In fact, I'm sorry that I missed to mention MATCH_RECOGNIZE in my 
original mail.
> I was really excited when I first hea

[jira] [Created] (CALCITE-2640) Support for SQL on Timeseries / Traces

2018-10-25 Thread Julian Feinauer (JIRA)
Julian Feinauer created CALCITE-2640:


 Summary: Support for SQL on Timeseries / Traces
 Key: CALCITE-2640
 URL: https://issues.apache.org/jira/browse/CALCITE-2640
 Project: Calcite
  Issue Type: New Feature
Reporter: Julian Feinauer
Assignee: Julian Hyde


When working with time series data, e.g., from IoT devices, traces or other 
sources there is often the need for analysis on "transients". This means that 
it is important to compare values with their neighbors (i.e., prev / next) to 
detect jumps or changes of bits.

This is possible with the MATCH_RECOGNIZE functionality from SQL 2016 but 
perhaps there is also the possibility for a "sensible" extension to SQL (like 
Streaming SQL is) to support these use cases with a "less technical" syntax. 
Others also use a pseudo-SQL syntax to do "more time series focussed" things 
(like use GROUP BY to do time averaging in IinfluxDB). See [1] for InfluxDBs 
SQL-like QL InfluxQL.
And it would be better to have a "more standardized" way of the interaction 
between SQL and timeseries than several SQL-like DSLs.

There was also a discussion on the list, see [2].

[1] https://docs.influxdata.com/influxdb/v1.6/query_language/
[2] 
https://lists.apache.org/thread.html/250575a56165851ab55351b90a26eaa30e84d5bbe2b31203daaaefb9@%3Cdev.calcite.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: MATCH_RECOGNIZE

2018-10-23 Thread Julian Feinauer
Hi Julian,

I decided to reply to this (old) email, because here some facts are noted.
Funnily, Apache Flink released their MATCH_RECOGNIZE Implementation yesterday.

So I recall that you and Zhigiang He did something on this.
I would like to have such a feature in Calcite (as stated in the other mail) 
and could try to go into this a bit with a colleague of mine and give a bit of 
support on this topic (In fact, it sounds like fun to us…).
Perhaps theres also the chance to learn something from Flinks implementation, 
as you already had some contacts with them, I think?

Best
Julian

On 2018/07/23 17:53:57, Julian Hyde  wrote:

For quite a while we have had partial support for MATCH_RECOGNIZE. We support 
it in the parser and validator, but there is no runtime implementation. It’s a 
shame, because MATCH_RECOGNIZE is an incredibly powerful SQL feature for both 
traditional SQL (it’s in Oracle 12c) and for continuous query (aka complex 
event processing - CEP).>

I figure it’s time to change that. My plan is to implement it incrementally, 
getting simple queries working to start with, then allow people to add more 
complex queries.>

In a dev branch [1], I’ve added a method Enumerables.match[2]. The idea is that 
if you supply an Enumerable of input data, a finite state machine to figure out 
when a sequence of rows makes a match (represented by a transition function: 
(state, row) -> state), and a function to convert a matched set of rows to a 
set of output rows. The match method is fairly straightforward, and I almost 
have it finished.>

The complexity is in generating the finite state machine, emitter function, and 
so forth.>

Can someone help me with this task? If your idea of fun is implementing 
database algorithms, this is about as much fun as it gets. You learned about 
finite state machines in college - this is your chance to actually write one!>

This might be a good joint project with the Flink community. I know Flink are 
thinking of implementing CEP, and the algorithm we write here could be shared 
with Flink (for use via Flink SQL or via the Flink API).>

Julian>

[1] https://github.com/julianhyde/calcite/commits/1935-match-recognize 
>

[2] 
https://github.com/julianhyde/calcite/commit/4dfaf1bbee718aa6694a8ce67d829c32d04c7e87#diff-8a97a64204db631471c563df7551f408R73
 
>



Re: Calcite on Traces / Industry 4.0 data

2018-10-22 Thread Julian Feinauer
Hi Julian,

first of thanks for your reply and your thoughts.
Thinking about your arguments, I fully agree to what you say and we should 
really consider using MATCH_REGOCNIZE first and see where it gets us.

To our second "problem", the different channel groups (with unequal time 
stamps), we also need a sound mapping to SQL then. My first thought was to use 
the "drill approach" and to simply simulate a table which has all columns 
somebody wants (as we do not know that upfront) and return NULL or NaN values 
when the channel is not present at evaluation time (and do all the 
interpolation and stuff in the background). Or does anybody have a better idea?

For your suggested approach I agree and will try to write some of our analysis 
(in our Java DSL) with MATCH_RECOGNICE to see how well it fits and come back 
then to the list.

Thanks
Julian

Am 23.10.18, 05:55 schrieb "Julian Hyde" :

Julian,

Thanks for posting this to Calcite. We appreciate the opportunity to mull 
over a language and prevent a mis-guided SQL-like language.

I agree with both you and Mark: MATCH_RECOGNIZE seems to be very well 
suited to your problem domain. And MATCH_RECOGNIZE is non-trivial and difficult 
to learn.

But in its favor, MATCH_RECOGNIZE is in the SQL standard and has reference 
implementations in systems like Oracle, so we can assume that it is 
well-specified. And, in my opinion, it is well designed - it delivers 
significant extra power to SQL that could not be done efficiently or at all 
without it, and is consistent with existing SQL semantics. Lastly, the 
streaming systems such as Flink and Beam are adopting it.

When your proposed language has gone through the same process, I suspect 
that it would end up being very similar to MATCH_RECOGNIZE. MATCH_RECOGNIZE may 
seem “imperative” because it it is creating a state-transition engine, but 
finite-state automata can be reasoned and safely transformed, and are therefore 
to all intents and purposes “declarative”.

The biggest reason not to use MATCH_RECOGNIZE is your audience. There’s no 
point creating the perfect language if the audience doesn’t like it and want to 
adopt it. So perhaps your best path is to design your own language, find some 
examples and code them up as use cases in that language, and iterate based on 
your users’ feedback.

If I were you, I would also code each of those examples in SQL using 
MATCH_RECOGNIZE, and make sure that there is a sound mapping between those 
languages. And maybe your language could be implemented as a thin layer above 
MATCH_RECOGNIZE.

This is the same advice I would give to everyone who is writing a database: 
I don’t care whether you use SQL, but make sure your language maps onto 
(extended) relational algebra. (And if you create a SQL-like language that 
breaks some of the concepts of SQL, such automatically joining tables, please 
don’t tell people that your language is SQL.)

I’m sorry to say that Calcite’s implementation of MATCH_RECOGNIZE has not 
moved forward much since my email. Maybe your effort is the kick necessary to 
get it going. I can assure you that I still believe that MATCH_RECOGNIZE, and 
the algebra that underlies it, is a solid foundation.

Julian



> On Oct 21, 2018, at 10:04 PM, Julian Feinauer 
 wrote:
> 
> Hi Mark,
> 
> thanks for your reply.
> In fact, I'm sorry that I missed to mention MATCH_RECOGNIZE in my 
original mail.
> I was really excited when I first heard about MATCH_RECOGNIZE as it is 
incredibly powerful and could be used so solve many of the problems I state in 
my mail.
> The only "drawback" I see is that it feels so technical and complex.
> By that I mean that it took me quite a while to figure out how to use it 
(and I would consider myself as experienced SQL user). And it kind of "breaks" 
the foundation of SQL in the sense that it is pretty imperative and not to 
declarative.
> 
> This is no general critics to the feature. The point I'm trying to make 
is that there is a (from my perspective) large class of similar problems and I 
would love to have a solution which "feels" natural and offers suitable 
"semantics" for the field.
> 
> But coming back to the MATCH_RECOGNIZE support in Calcite, is there any 
progress with regards to Julians Post from July?
> If not I can offer to give some support with the implementation of the 
FSM / NFA.
> One solution for us could then also be to take a query in the "Timeseries 
SQL"-dialect and transform it to a Query with MATCH_RECOGNIZE.
> 
> So if there is still help needed please let me know (a quick search 
through the JIRA showed CALCITE-1935) which seems like there is still some 
implementation missing.
> 
> Best
> Julia

Re: Calcite on Traces / Industry 4.0 data

2018-10-21 Thread Julian Feinauer
Hi Mark,

thanks for your reply.
In fact, I'm sorry that I missed to mention MATCH_RECOGNIZE in my original mail.
I was really excited when I first heard about MATCH_RECOGNIZE as it is 
incredibly powerful and could be used so solve many of the problems I state in 
my mail.
The only "drawback" I see is that it feels so technical and complex.
By that I mean that it took me quite a while to figure out how to use it (and I 
would consider myself as experienced SQL user). And it kind of "breaks" the 
foundation of SQL in the sense that it is pretty imperative and not to 
declarative.

This is no general critics to the feature. The point I'm trying to make is that 
there is a (from my perspective) large class of similar problems and I would 
love to have a solution which "feels" natural and offers suitable "semantics" 
for the field.

But coming back to the MATCH_RECOGNIZE support in Calcite, is there any 
progress with regards to Julians Post from July?
If not I can offer to give some support with the implementation of the FSM / 
NFA.
One solution for us could then also be to take a query in the "Timeseries 
SQL"-dialect and transform it to a Query with MATCH_RECOGNIZE.

So if there is still help needed please let me know (a quick search through the 
JIRA showed CALCITE-1935) which seems like there is still some implementation 
missing.

Best
Julian


Am 22.10.18, 02:41 schrieb "Mark Hammond" :

Hi Julian Feinauer,

Do share your thoughts on MATCH_RECOGNIZE operator suitability, 
http://mail-archives.apache.org/mod_mbox/calcite-dev/201807.mbox/%3cc6a37dae-f884-4d90-8ec0-8fd4efde1...@apache.org%3e

Cheers,
    Mark.

> On 22 Oct 2018, at 02:24, Julian Feinauer  
wrote:
> 
> Dear calcite devs,
> 
> I follow the project for a long time and love how calcite made it 
possible to use SQL everywhere (have done several sql interfaces on top of 
specific file formats myself). I also like the strong support for streaming SQL.
> 
> The reason I'm writing this email is not only to give the project some 
love but because we are thinking about a SQL "extension" which I think is not 
so specific but could serve others as well in different use cases.
> 
> In detail, we are working with Streams of Data from Devices (think of 
industry 4.0). We read data, e.g., from PLCs (using the (incubating) Apache 
PLC4X project where I contribute) and do analytics on them. The analysis which 
are done there are pretty similar when working with traces from tests, e.g., 
automotive test drives or from related industries. What all these streams have 
in  common is
> * usually ordered by time
> * elements of different groups of signals ("rows" from "tables") arrive 
ordered by time but not with equal timestamps, e.g., time each second, other 
quantities much more frequent
> * "natural" join for these signal groups ("tables") is some kind of 
interpolation (sample and hold, linear interpolation, splinces, ...) with 
respect to (event-)time
> * In some cases signal types are not known and can only be guessed based 
on first value, e.g., on CAN there is no strict notion of "double" or "integer" 
channels but rather there are integer base values + a conversion formula (like 
a x + b) + possible lookup tables for "other" values (SNA, NULL, DISABLED, ...)
> 
> On the other hand the analysis we like to perform are often timestamps
> * get timestamps where a condition becomes true
>  * boolean value toggled
>  * numeric value is above / below threshold
>  * signal change rate is above / below threshold
>  * ...
> * get the values of certain signals at the point in time when a condition 
becomes true (see above)
> * get windows based on conditions
>  * while signal is true
>  * while value above ...
>  * ...
> * Do aggregations on signals in the mentioned windows
> 
> Parts of this could done in most SQL dialects (I'm no expert for the 
standard but in Postgres one could use LAG and partitions) but this is not 
efficient and not all of the above could be done with that.
> So we think about an extension (or a dialect) for "traces" or "time 
series" which has a syntax that is slightly extended to allow such queries as 
stated above.
> 
> To give you an example of what such an extension could look like:
> 
> ```
> SELECT start(), end(), MAX(current) FROM s7://127.0.0.1/0/0 WHILE 
cycle_in_progress = TRUE
> SELECT timestamp, current AS start_current FROM s7://127.0.0.1/0/0 WHERE 
cycle_in_progress = TRUE TRIGGER ON_BECOME_TRUE
> SELECT timestamp, current AS start_current FROM s7://127.0.0.1/0/0 WHERE 
cycle_in_progress = TRUE

Calcite on Traces / Industry 4.0 data

2018-10-21 Thread Julian Feinauer
Dear calcite devs,

I follow the project for a long time and love how calcite made it possible to 
use SQL everywhere (have done several sql interfaces on top of specific file 
formats myself). I also like the strong support for streaming SQL.

The reason I'm writing this email is not only to give the project some love but 
because we are thinking about a SQL "extension" which I think is not so 
specific but could serve others as well in different use cases.

In detail, we are working with Streams of Data from Devices (think of industry 
4.0). We read data, e.g., from PLCs (using the (incubating) Apache PLC4X 
project where I contribute) and do analytics on them. The analysis which are 
done there are pretty similar when working with traces from tests, e.g., 
automotive test drives or from related industries. What all these streams have 
in  common is
* usually ordered by time
* elements of different groups of signals ("rows" from "tables") arrive ordered 
by time but not with equal timestamps, e.g., time each second, other quantities 
much more frequent
* "natural" join for these signal groups ("tables") is some kind of 
interpolation (sample and hold, linear interpolation, splinces, ...) with 
respect to (event-)time
* In some cases signal types are not known and can only be guessed based on 
first value, e.g., on CAN there is no strict notion of "double" or "integer" 
channels but rather there are integer base values + a conversion formula (like 
a x + b) + possible lookup tables for "other" values (SNA, NULL, DISABLED, ...)

On the other hand the analysis we like to perform are often timestamps
* get timestamps where a condition becomes true
  * boolean value toggled
  * numeric value is above / below threshold
  * signal change rate is above / below threshold
  * ...
* get the values of certain signals at the point in time when a condition 
becomes true (see above)
* get windows based on conditions
  * while signal is true
  * while value above ...
  * ...
* Do aggregations on signals in the mentioned windows

Parts of this could done in most SQL dialects (I'm no expert for the standard 
but in Postgres one could use LAG and partitions) but this is not efficient and 
not all of the above could be done with that.
So we think about an extension (or a dialect) for "traces" or "time series" 
which has a syntax that is slightly extended to allow such queries as stated 
above.

To give you an example of what such an extension could look like:

```
SELECT start(), end(), MAX(current) FROM s7://127.0.0.1/0/0 WHILE 
cycle_in_progress = TRUE
SELECT timestamp, current AS start_current FROM s7://127.0.0.1/0/0 WHERE 
cycle_in_progress = TRUE TRIGGER ON_BECOME_TRUE
SELECT timestamp, current AS start_current FROM s7://127.0.0.1/0/0 WHERE 
cycle_in_progress = TRUE TRIGGER ON_BECOME_TRUE
```

Why am I bothering you with this?
Well, first, you are experts and I would love to get some feedback on thoughts 
of that.
But, most important, I am thinking about writing (yet another) SQL parser with 
slight extensions and would then have to care for a "runtime" which would be 
partially similar (in functionality, not in maturity or sophistication) to 
Calcites Enumerable-Trait. So I was thinking whether there is a way to make all 
of this work "on top" of Calcite (custom RelNodes and an extension to the 
parser) but I'm unsure about that as some of the internals of Calcite are tied 
very specifically to Sql... like, e.g., SqlToRelConverter.
Do you have any ideas on how one would be able to implement this "minimaly 
invasive" on top of Calcite and whether this is possible "ex-situ" or if this 
should then be done in the same codebase (e.g. a subproject) as it would need 
some changes near Calcites core?

Please excuse this rather long email but I would really appreciate any answers, 
comments or suggestions.

Best
Julian


Re: Problem with Calcite and SQuirrelSQL

2018-08-13 Thread Julian Feinauer
Hi Julian,



thanks for the fast reply.

I also found this thread but as I use Calcite 1.14 this should not be the issue 
(was fixed in 1.11).

I also checked the code to confirm that (and I do not use shading so this 
should be no problem at all).



So I agree with you that this should be somewhat related but I have no clue 
where (or how) exactly I could trace the problem.

So any help is appreciated!



Best

Julian



Am 13.08.18, 17:25 schrieb "Julian Hyde" :



Looks similar to this:


http://mail-archives.apache.org/mod_mbox/calcite-dev/201611.mbox/%3CCAGssvOYTS_tMGh=xASVqgC47DpBQQtYgc8r7BjNuvR+w=hh...@mail.gmail.com%3E

On Mon, Aug 13, 2018 at 5:37 AM Julian Feinauer

 wrote:

>

> Hi devs,

>

> First, a short disclaimer, I am cross-posting this question on the 
calcite and on the SQuirrelSQL mailing list as I’m not really sure where the 
problem comes from.

>

> I am using calcite with a custom Schema to read a specific file format as 
DB.

> It works when running the queries embedded in Test Code.

> When I link my jar into the sqlline script it also works flawlessly but 
when I link the code to SQuirrelSQL [1] it gives me the following stacktrace:

>

>

> 2018-08-13 13:06:15,471 [Thread-3] DEBUG 
net.sourceforge.squirrel_sql.fw.util.DefaultExceptionFormatter  - Error

>

>  593 java.sql.SQLException: Error while executing SQL "SELECT * FROM 
metadata.TABLES": Unable to instantiate java compiler

>

>  594 at 
org.apache.calcite.avatica.Helper.createException(Helper.java:56)

>

>  595 at 
org.apache.calcite.avatica.Helper.createException(Helper.java:41)

>

>  596 at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)

>

>  597 at 
org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:209)

>

>  598 at 
net.sourceforge.squirrel_sql.client.session.StatementWrapper.execute(StatementWrapper.java:165)

>

>  599 at 
net.sourceforge.squirrel_sql.client.session.SQLExecuterTask.processQuery(SQLExecuterTask.java:369)

>

>  600 at 
net.sourceforge.squirrel_sql.client.session.SQLExecuterTask.run(SQLExecuterTask.java:212)

>

>  601 at 
net.sourceforge.squirrel_sql.fw.util.TaskExecuter.run(TaskExecuter.java:82)

>

>  602 at java.lang.Thread.run(Thread.java:745)

>

>  603 Caused by: java.lang.IllegalStateException: Unable to instantiate 
java compiler

>

>  604 at 
org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.compile(JaninoRelMetadataProvider.java:433)

>

>  605 at 
org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.load3(JaninoRelMetadataProvider.java:374)

>

>  606 at 
org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.access$000(JaninoRelMetadataProvider.java:94)

>

>  607 at 
org.apache.calcite.rel.metadata.JaninoRelMetadataProvider$1.load(JaninoRelMetadataProvider.java:113)

>

>  608 at 
org.apache.calcite.rel.metadata.JaninoRelMetadataProvider$1.load(JaninoRelMetadataProvider.java:110)

>

>  609 at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)

>

>  610 at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)

>

>  611 at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)

>

>  612 at 
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)

>

>  613 at 
com.google.common.cache.LocalCache.get(LocalCache.java:3953)

>

>  614 at 
com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)

>

>  615 at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)

>

>  616 at 
org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.create(JaninoRelMetadataProvider.java:464)

>

>  617 at 
org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.revise(JaninoRelMetadataProvider.java:477)

>

>  618 at 
org.apache.calcite.rel.metadata.RelMetadataQuery.revise(RelMetadataQuery.java:203)

>

>  619 at 
org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:565)

>

>  620 at 
org.apache.calcite.rel.metadata.RelMdCollation.project(RelMdCollation.java:207)

>

>  621 at 
org.apache.calcite.rel.logical.LogicalProject$1.get(LogicalProject.java:117)

>

>  622 at 
org.apache.calcite.rel.logi

User defined functions with non scalar return type

2016-12-09 Thread Julian Feinauer
Hey Guys,

 

first of all, great work!

I’m following the Calcite project since I worked a lot with Drill about 8 
months ago and I’m very pleased about how everything has developed.

We are currently trying to replace our self-made SQL Parser and Engine with 
Calcite.

But I have one question as we need to create UDFs which do not only return a 
single scalar value.

 

In fact, we do some kind of mapping from one table (that is processed in a 
streaming fashion like the AggregateFunctions do) to another Table (or even 
more complex nested structure).

Currently I do not see a way to do this as the AggregateFunctions do only allow 
to return scalar primitive types, or?

All my trials in returning different types resulted in Compile Errors from the 
Code generation.

 

On the other hand, the TableFunctions do not seem appropriate to me as they can 
only use parameters as input and not Columns that are passed row by row, or?

 

I would really appreciate any suggestions or help for this problem.

I’m also open for a discussion about how to implement such a feature in Calcite.

 

Thank you already!

Julian