Re: Use of noDag parameter in HepPlanner

2019-02-17 Thread Vitalii Diravka
Stamatis,

Just FYI, maybe it will be useful for you,
Drill uses *noDAG: true *as default value for HepPlanner [1].
After changing it to false, a lot of Drill unit tests failed [2].

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java#L416
[2] https://travis-ci.org/vdiravka/drill/jobs/494499462

Kind regards
Vitalii


On Fri, Feb 15, 2019 at 2:43 PM Stamatis Zampetakis 
wrote:

> FYI, what I concluded by going through the code and the various test cases
> is the following.
>
> By allowing DAGs the planner can detect common sub expressions in queries
> and re-use an existing result without re-applying a rule if that is not
> necessary. This should lead to fewer object creations and rule
> applications, which may in turn lead to improved performance. In the
> existing use cases noDag=false should appear more often since it is the
> default value for two out of three constructors in the HepPlanner.
>
> In principle it seems that using or not using DAGs should give the same
> expression in the end so I would say that using DAGs is always a better
> option. I tried setting noDag to be always true but various test fail with
> StackOverflowError so it seems there are rules who tend to execute infinite
> number of times as a result of this change. I would tend to thing that this
> is a bug but I didn't look further.
>
> Στις Τρί, 12 Φεβ 2019 στις 9:35 μ.μ., ο/η Julian Hyde 
> έγραψε:
>
> > I don’t recall.
> >
> > Could you review the tests and see whether tests tend to use noDag=true
> or
> > false most of the time? Are there any tests that use the less popular
> > value, and if so, is there a particular reason that those tests use that
> > option?
> >
> > Julian
> >
> >
> > > On Feb 12, 2019, at 6:47 AM, Stamatis Zampetakis 
> > wrote:
> > >
> > > Hi all,
> > >
> > > I don't understand what is the correct way to set the noDag [1]
> parameter
> > > in HepPlanner. I understand what it does (internal query graph becomes
> a
> > > tree or a DAG) but I don't see why should I use the one or the other
> and
> > > when.
> > >
> > > Is it performance related?
> > > Are there implications on the rules that can be used with the planner?
> > > Does it limit the class of queries that need to be transformed?
> > >
> > > Thanks in advance,
> > > Stamatis
> > >
> > > [1]
> > >
> >
> https://github.com/apache/calcite/blob/883666929478aabe07ee5b9e572c43a6f1a703e2/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java#L131
> >
> >
>


Calcite-Master - Build # 1025 - Failure

2019-02-17 Thread Apache Jenkins Server
The Apache Jenkins build system has built Calcite-Master (build #1025)

Status: Failure

Check console output at https://builds.apache.org/job/Calcite-Master/1025/ to 
view the results.

Re: [DISCUSS] Move site repositories from svn to gitbox

2019-02-17 Thread Francis Chuang
@Michael, the svn repo will still be kept, but just unused. See kafka's 
old site: https://svn.apache.org/repos/asf/kafka/site/


I have now pushed a the current working copy of our site to 
https://github.com/apache/calcite-site using svn export.


I have also updated my ticket with infra to ask them to switch the 
site's publishing mechanism from svnpubsub to gitpubsub.


I'll now proceed with updating the publishing instructions for our site 
to git.


On 16/02/2019 5:37 am, Julian Hyde wrote:

Agreed, the history of the web site is not very important.

Julian


On Feb 15, 2019, at 5:58 AM, Michael Mior  wrote:

I think we may want to keep the old SVN repository around if this is
the case, but I personally don't have a problem with losing history in
the new git repo. On a related note, it would be good to find a
process for the new repo that can work with a shallow clone so we
don't have to have the entire history of the site to push a change.

--
Michael Mior
mm...@apache.org

Le ven. 15 févr. 2019 à 05:29, Francis Chuang
 a écrit :


Hey everyone,

I have now created the calcite-site repo in Gitbox. It is now available
via Github and the Gitbox endpoint, but currently empty.

I am currently trying to migrate the svn repo, but it is taking a very
long time and eventually timed out for me. A member of the ASF infra
team has also confirmed that it can take hours or days to complete [1].

I feel that it would probably be easier if we just copy the existing
files from the svn repo and make that the first commit in the git repo.
This is what Kafka did for their migration [2].

How important are the commits for site pushes? In my opinion it's
probably acceptable if we lose them and start anew with the git repo as
they do not document changes to our code base.

Happy to hear your thoughts!

Francis

[1] https://issues.apache.org/jira/browse/INFRA-17846
[2]
https://github.com/apache/kafka-site/commit/ba6c994ca09629b047ab9175f882877ba03b92da


On 11/02/2019 9:00 pm, Francis Chuang wrote:
Hey all,

ASF project sites have the ability to use git instead of subversion as
their repository for web site content [1]. It has been available since
2015 and appears to be quite stable. Quite a few other projects have
also moved their websites to git and subsequently, Gitbox (for using
Github as their source of truth. As an example, see the Arrow project [2].

I myself would love to see this as I find gits interface and ux to be
much easier to use compared to svn. It also reduces the need to context
switch between Git and svn when editing and pushing the site.

My overall goal is to find a way to automate the publishing and build of
our websites either via Jenkins builds (there are some projects are
doing this already when I searched infra) or the new Github actions [3].
Having the site hosted in Git would make this process much easier to
automate. I will need to get in touch with infra to clarify a few things
and to see if this is feasible, but I think this is a worthwhile endeavor.

How do you guys feel about moving our site's repository from svn to GitBox?

Francis


[1] https://blogs.apache.org/infra/entry/git_based_websites_available
[2] https://issues.apache.org/jira/browse/INFRA-17655
[3] https://github.com/features/actions






Re: Use of noDag parameter in HepPlanner

2019-02-17 Thread Stamatis Zampetakis
Thanks for the additional info Vitalii!

It made me also realize that I had a typo in my previous email.
I meant to write that I tried setting *noDag=false* (since I wanted to
enable DAGs) everywhere but I had failures in various places.
Setting *noDag=true* globally will not work and most likely is not
desirable either.

Στις Κυρ, 17 Φεβ 2019 στις 8:04 μ.μ., ο/η Vitalii Diravka <
vita...@apache.org> έγραψε:

> Stamatis,
>
> Just FYI, maybe it will be useful for you,
> Drill uses *noDAG: true *as default value for HepPlanner [1].
> After changing it to false, a lot of Drill unit tests failed [2].
>
> [1]
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java#L416
> [2] https://travis-ci.org/vdiravka/drill/jobs/494499462
>
> Kind regards
> Vitalii
>
>
> On Fri, Feb 15, 2019 at 2:43 PM Stamatis Zampetakis 
> wrote:
>
> > FYI, what I concluded by going through the code and the various test
> cases
> > is the following.
> >
> > By allowing DAGs the planner can detect common sub expressions in queries
> > and re-use an existing result without re-applying a rule if that is not
> > necessary. This should lead to fewer object creations and rule
> > applications, which may in turn lead to improved performance. In the
> > existing use cases noDag=false should appear more often since it is the
> > default value for two out of three constructors in the HepPlanner.
> >
> > In principle it seems that using or not using DAGs should give the same
> > expression in the end so I would say that using DAGs is always a better
> > option. I tried setting noDag to be always true but various test fail
> with
> > StackOverflowError so it seems there are rules who tend to execute
> infinite
> > number of times as a result of this change. I would tend to thing that
> this
> > is a bug but I didn't look further.
> >
> > Στις Τρί, 12 Φεβ 2019 στις 9:35 μ.μ., ο/η Julian Hyde 
> > έγραψε:
> >
> > > I don’t recall.
> > >
> > > Could you review the tests and see whether tests tend to use noDag=true
> > or
> > > false most of the time? Are there any tests that use the less popular
> > > value, and if so, is there a particular reason that those tests use
> that
> > > option?
> > >
> > > Julian
> > >
> > >
> > > > On Feb 12, 2019, at 6:47 AM, Stamatis Zampetakis 
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I don't understand what is the correct way to set the noDag [1]
> > parameter
> > > > in HepPlanner. I understand what it does (internal query graph
> becomes
> > a
> > > > tree or a DAG) but I don't see why should I use the one or the other
> > and
> > > > when.
> > > >
> > > > Is it performance related?
> > > > Are there implications on the rules that can be used with the
> planner?
> > > > Does it limit the class of queries that need to be transformed?
> > > >
> > > > Thanks in advance,
> > > > Stamatis
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/883666929478aabe07ee5b9e572c43a6f1a703e2/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java#L131
> > >
> > >
> >
>