[jira] [Updated] (CALCITE-2204) Volcano Planner may not choose the cheapest cost plan
[ https://issues.apache.org/jira/browse/CALCITE-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LeoWangLZ updated CALCITE-2204: --- Summary: Volcano Planner may not choose the cheapest cost plan (was: Volcano Planner may not choose the cheapest cost plan wrongly) > Volcano Planner may not choose the cheapest cost plan > -- > > Key: CALCITE-2204 > URL: https://issues.apache.org/jira/browse/CALCITE-2204 > Project: Calcite > Issue Type: Bug >Reporter: LeoWangLZ >Assignee: Julian Hyde >Priority: Major > > Volcano Planner will propagate the cost improvement to parents that one of > the inputs has the best plan. But it not propagate to all parents firstly, it > propagate one parent and go to the parent s of parent. In the way, if one > parent may propagate to other parent on the same level, and the cost maybe > less than others, but when propagate the parent again, it will not propagate > because the cost is already calculated, and it's can't be less then the cost > of the self. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2204) Volcano Planner may not choose the cheapest cost plan wrongly
[ https://issues.apache.org/jira/browse/CALCITE-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LeoWangLZ updated CALCITE-2204: --- Summary: Volcano Planner may not choose the cheapest cost plan wrongly (was: Volcano Planner may not choose the cheapest cost of plan wrongly) > Volcano Planner may not choose the cheapest cost plan wrongly > - > > Key: CALCITE-2204 > URL: https://issues.apache.org/jira/browse/CALCITE-2204 > Project: Calcite > Issue Type: Bug >Reporter: LeoWangLZ >Assignee: Julian Hyde >Priority: Major > > Volcano Planner will propagate the cost improvement to parents that one of > the inputs has the best plan. But it not propagate to all parents firstly, it > propagate one parent and go to the parent s of parent. In the way, if one > parent may propagate to other parent on the same level, and the cost maybe > less than others, but when propagate the parent again, it will not propagate > because the cost is already calculated, and it's can't be less then the cost > of the self. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2202) Aggregate Join Push-down on a Single Side
[ https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392289#comment-16392289 ] Zhong Yu edited comment on CALCITE-2202 at 3/9/18 2:29 AM: --- Everything is moot if I can not prove my formula. But suppose it is correct – I do think that COVAR_POP can be pushed down; it can be calculate from SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through table union and cross product, therefore can be pushed down over join. Producing more candidate plans may be bad for CBO; but the extra rule (i.e. singled sided) can be opt-in in some cases where metadata is missing, and stats shows that group columns are unique or nearly unique. was (Author: zhong.j.yu): Everything is moot if I can not prove my formula. But suppose it is correct – I do think that COVAR_POP can be pushed down; it can be calculate from SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through table union and cross product, therefore can be pushed down over join. Producing more candidate plans may be bad for CBO; but the extra rule (i.e. singled sided) can be opted in some cases where metadata is missing, or group columns are nearly unique. > Aggregate Join Push-down on a Single Side > - > > Key: CALCITE-2202 > URL: https://issues.apache.org/jira/browse/CALCITE-2202 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: next >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: next > > > While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's > apparent that aggregation can be pushed on on a single side (either side), > and leave the other side non-aggregated, regardless of whether grouping > columns are unique on the other side. My analysis – > [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . > This may be useful when the metadata is insufficient; in any case, we may try > to provide all 3 possible transformations (aggregate on left only; right > only; both sides) to the cost based optimizer, so that the cheapest one can > be chosen based on stats. > Does this make any sense, anybody? If it sounds good, I'll implement it and > offer a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2202) Aggregate Join Push-down on a Single Side
[ https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392289#comment-16392289 ] Zhong Yu edited comment on CALCITE-2202 at 3/9/18 2:27 AM: --- Everything is moot if I can not prove my formula. But suppose it is correct – I do think that COVAR_POP can be pushed down; it can be calculate from SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through table union and cross product, therefore can be pushed down over join. Producing more candidate plans may be bad for CBO; but the extra rule (i.e. singled sided) can be opted in some cases where metadata is missing, or group columns are nearly unique. was (Author: zhong.j.yu): Everything is moot if I can not prove my formula. But suppose it is correct -- I do think that COVAR_POP can be pushed down; it can be calculate from SUM(x*y), SUM(x), SUM(y), COUNT(x,y), all of which can be split through table union and cross product, therefore can be pushed down over join. Producing more candidate plans may be bad for CBO; but the extra rule (i.e. singled sided) can be opted in some cases where metadata is missing, or group columns are nearly unique. > Aggregate Join Push-down on a Single Side > - > > Key: CALCITE-2202 > URL: https://issues.apache.org/jira/browse/CALCITE-2202 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: next >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: next > > > While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's > apparent that aggregation can be pushed on on a single side (either side), > and leave the other side non-aggregated, regardless of whether grouping > columns are unique on the other side. My analysis – > [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . > This may be useful when the metadata is insufficient; in any case, we may try > to provide all 3 possible transformations (aggregate on left only; right > only; both sides) to the cost based optimizer, so that the cheapest one can > be chosen based on stats. > Does this make any sense, anybody? If it sounds good, I'll implement it and > offer a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-1806) UnsupportedOperationException accessing Druid through Spark JDBC
[ https://issues.apache.org/jira/browse/CALCITE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392285#comment-16392285 ] ASF GitHub Bot commented on CALCITE-1806: - Github user risdenk commented on the issue: https://github.com/apache/calcite-avatica/pull/28 ``` Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 7.373 sec <<< FAILURE! - in org.apache.calcite.avatica.remote.SparkClientTest testSpark[JSON](org.apache.calcite.avatica.remote.SparkClientTest) Time elapsed: 6.864 sec <<< FAILURE! java.lang.AssertionError: expected:<1> but was:<0> at org.apache.calcite.avatica.remote.SparkClientTest.testSpark(SparkClientTest.java:108) testSpark[PROTOBUF](org.apache.calcite.avatica.remote.SparkClientTest) Time elapsed: 0.48 sec <<< FAILURE! java.lang.AssertionError: expected:<1> but was:<0> at org.apache.calcite.avatica.remote.SparkClientTest.testSpark(SparkClientTest.java:108) ``` > UnsupportedOperationException accessing Druid through Spark JDBC > > > Key: CALCITE-1806 > URL: https://issues.apache.org/jira/browse/CALCITE-1806 > Project: Calcite > Issue Type: Bug > Components: avatica >Affects Versions: avatica-1.9.0 > Environment: Spark 1.6, scala 10, CDH 5.7 >Reporter: Benjamin Vogan >Priority: Major > > I am interested in querying Druid via Spark. I realize that there are > several mechanisms for doing this, but I was curious about using the JDBC > batch offered by the latest release as it is familiar to our analysts and > seems like it should be a well supported path going forward. > My first attempt has failed with an UnsupportedOperationException. I ran > spark-shell with the --jars option to add the avatica 1.9.0 jdbc driver jar. > {noformat} > scala> val dw2 = sqlContext.read.format("jdbc").options(Map("url" -> > "jdbc:avatica:remote:url=http://jarvis-druid-query002:8082/druid/v2/sql/avatica/;, > "dbtable" -> "sor_business_events_all", "driver" -> > "org.apache.calcite.avatica.remote.Driver", "fetchSize"->"1")).load() > java.lang.UnsupportedOperationException > at > org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:275) > at > org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:121) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:122) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91) > at > org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:57) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:25) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:32) > at $iwC$$iwC$$iwC$$iwC$$iwC.(:34) > at $iwC$$iwC$$iwC$$iwC.(:36) > at $iwC$$iwC$$iwC.(:38) > at $iwC$$iwC.(:40) > at $iwC.(:42) > at (:44) > at .(:48) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at >
[jira] [Commented] (CALCITE-2207) Enforce Java version via maven-enforcer-plugin
[ https://issues.apache.org/jira/browse/CALCITE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392188#comment-16392188 ] Kevin Risden commented on CALCITE-2207: --- Should remove openjdk7 from .travis.yml as well. > Enforce Java version via maven-enforcer-plugin > -- > > Key: CALCITE-2207 > URL: https://issues.apache.org/jira/browse/CALCITE-2207 > Project: Calcite > Issue Type: Task > Components: avatica >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Critical > Fix For: avatica-1.12.0 > > > Now that jdk7 support has been dropped, we should add some logic to the build > to fail obviously when a version of Java is used that we don't support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-1806) UnsupportedOperationException accessing Druid through Spark JDBC
[ https://issues.apache.org/jira/browse/CALCITE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392187#comment-16392187 ] ASF GitHub Bot commented on CALCITE-1806: - Github user risdenk commented on the issue: https://github.com/apache/calcite-avatica/pull/28 I'm expecting the build to fail with 0 rows being returned when there is actually a row there. Debug logging is turned on for the SparkSession to try to get some more information for this. > UnsupportedOperationException accessing Druid through Spark JDBC > > > Key: CALCITE-1806 > URL: https://issues.apache.org/jira/browse/CALCITE-1806 > Project: Calcite > Issue Type: Bug > Components: avatica >Affects Versions: avatica-1.9.0 > Environment: Spark 1.6, scala 10, CDH 5.7 >Reporter: Benjamin Vogan >Priority: Major > > I am interested in querying Druid via Spark. I realize that there are > several mechanisms for doing this, but I was curious about using the JDBC > batch offered by the latest release as it is familiar to our analysts and > seems like it should be a well supported path going forward. > My first attempt has failed with an UnsupportedOperationException. I ran > spark-shell with the --jars option to add the avatica 1.9.0 jdbc driver jar. > {noformat} > scala> val dw2 = sqlContext.read.format("jdbc").options(Map("url" -> > "jdbc:avatica:remote:url=http://jarvis-druid-query002:8082/druid/v2/sql/avatica/;, > "dbtable" -> "sor_business_events_all", "driver" -> > "org.apache.calcite.avatica.remote.Driver", "fetchSize"->"1")).load() > java.lang.UnsupportedOperationException > at > org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:275) > at > org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:121) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:122) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91) > at > org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:57) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:25) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:32) > at $iwC$$iwC$$iwC$$iwC$$iwC.(:34) > at $iwC$$iwC$$iwC$$iwC.(:36) > at $iwC$$iwC$$iwC.(:38) > at $iwC$$iwC.(:40) > at $iwC.(:42) > at (:44) > at .(:48) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at
[jira] [Commented] (CALCITE-1806) UnsupportedOperationException accessing Druid through Spark JDBC
[ https://issues.apache.org/jira/browse/CALCITE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392184#comment-16392184 ] ASF GitHub Bot commented on CALCITE-1806: - GitHub user risdenk opened a pull request: https://github.com/apache/calcite-avatica/pull/28 [WIP][CALCITE-1806] Add Apache Spark JDBC test to Avatica server This branch is a work in progress to show how Apache Spark and Avatica don't seem to be playing along nicely together. Spark JDBC against Avatica returns an empty result even though it determines the correct schema. You can merge this pull request into a Git repository by running: $ git pull https://github.com/risdenk/calcite-avatica CALCITE-1806 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/calcite-avatica/pull/28.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #28 commit 311ed286cdf2807e8daf84b52b5956a4b54d0a74 Author: Kevin RisdenDate: 2018-03-09T00:41:54Z [CALCITE-1806] Add Apache Spark JDBC test to Avatica server > UnsupportedOperationException accessing Druid through Spark JDBC > > > Key: CALCITE-1806 > URL: https://issues.apache.org/jira/browse/CALCITE-1806 > Project: Calcite > Issue Type: Bug > Components: avatica >Affects Versions: avatica-1.9.0 > Environment: Spark 1.6, scala 10, CDH 5.7 >Reporter: Benjamin Vogan >Priority: Major > > I am interested in querying Druid via Spark. I realize that there are > several mechanisms for doing this, but I was curious about using the JDBC > batch offered by the latest release as it is familiar to our analysts and > seems like it should be a well supported path going forward. > My first attempt has failed with an UnsupportedOperationException. I ran > spark-shell with the --jars option to add the avatica 1.9.0 jdbc driver jar. > {noformat} > scala> val dw2 = sqlContext.read.format("jdbc").options(Map("url" -> > "jdbc:avatica:remote:url=http://jarvis-druid-query002:8082/druid/v2/sql/avatica/;, > "dbtable" -> "sor_business_events_all", "driver" -> > "org.apache.calcite.avatica.remote.Driver", "fetchSize"->"1")).load() > java.lang.UnsupportedOperationException > at > org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:275) > at > org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:121) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:122) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91) > at > org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:57) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:25) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:32) > at $iwC$$iwC$$iwC$$iwC$$iwC.(:34) > at $iwC$$iwC$$iwC$$iwC.(:36) > at $iwC$$iwC$$iwC.(:38) > at $iwC$$iwC.(:40) > at $iwC.(:42) > at (:44) > at .(:48) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at >
[jira] [Commented] (CALCITE-1806) UnsupportedOperationException accessing Druid through Spark JDBC
[ https://issues.apache.org/jira/browse/CALCITE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392157#comment-16392157 ] Kevin Risden commented on CALCITE-1806: --- So I have good news and bad news on this. I can't reproduce the error from the original reporter. However, no data seems to be returned even though there is no error. I'm working on a test case now to show this. > UnsupportedOperationException accessing Druid through Spark JDBC > > > Key: CALCITE-1806 > URL: https://issues.apache.org/jira/browse/CALCITE-1806 > Project: Calcite > Issue Type: Bug > Components: avatica >Affects Versions: avatica-1.9.0 > Environment: Spark 1.6, scala 10, CDH 5.7 >Reporter: Benjamin Vogan >Priority: Major > > I am interested in querying Druid via Spark. I realize that there are > several mechanisms for doing this, but I was curious about using the JDBC > batch offered by the latest release as it is familiar to our analysts and > seems like it should be a well supported path going forward. > My first attempt has failed with an UnsupportedOperationException. I ran > spark-shell with the --jars option to add the avatica 1.9.0 jdbc driver jar. > {noformat} > scala> val dw2 = sqlContext.read.format("jdbc").options(Map("url" -> > "jdbc:avatica:remote:url=http://jarvis-druid-query002:8082/druid/v2/sql/avatica/;, > "dbtable" -> "sor_business_events_all", "driver" -> > "org.apache.calcite.avatica.remote.Driver", "fetchSize"->"1")).load() > java.lang.UnsupportedOperationException > at > org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:275) > at > org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:121) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:122) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91) > at > org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:57) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:25) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:32) > at $iwC$$iwC$$iwC$$iwC$$iwC.(:34) > at $iwC$$iwC$$iwC$$iwC.(:36) > at $iwC$$iwC$$iwC.(:38) > at $iwC$$iwC.(:40) > at $iwC.(:42) > at (:44) > at .(:48) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >
[jira] [Comment Edited] (CALCITE-2194) Ability to hide a schema
[ https://issues.apache.org/jira/browse/CALCITE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392092#comment-16392092 ] Piotr Bojko edited comment on CALCITE-2194 at 3/8/18 11:33 PM: --- Thx for quick feedback. * reformatting is my bad, will try to fix * periods - to be fixed * RexImpTable - it was a hocus pocus to me, but suddenly the files just blows with errors. Totally forgot to look at it after the poc was done. My bad - will inwestigate this further * Full access instead hidden feature - you're the boss, win win :) * as for Fairy - I'm not stick to names I can rename it. ThreadLocal like implementation was the quickest solution to drag the principal for almost all of the calcite stack - spring implements such features in the same threadlocal way. Any suggestion how can propagate the principal and I would try to ship it. * System.getProperty - previous implementation of USER/CURRENT_USER/SESSION_USER had used "user.name" property. I've added the fallback to previous behaviour in case an end user has not provided the user on the connection. * boolean indirect and INDIRECT_SELECT was provided to implement my needed case. You map the user to INDIRECT_SELECT in schema now and this user can use such schema only through views from another schema. You can map user to SELECT in schema (but not INDIRECT_SELECT) and such user can use schema only through direct selects and not through views from other schemas. * .gitgnore - will fix What do you think in general about the change? Do you consider to accept the change to calcite after patching you doubts? For me only the "SqlAccessEnum.INDIRECT_SELECT" change is something more than an implementation and it is your call whether to accept another access type or to look to redesign my change. If in general the change has a good direction I would how to design my project in which calcite plays a role. Thx in advance :) was (Author: ptrbojko): Thx for quick feedback. * reformatting is my bad, will try to fix * periods - to be fixed * RexImpTable - it was a hocus pocus to me, but suddenly the files just blows with errors. Totally forgot to look at it after the poc was done. My bad - will inwestigate this further * Full access instead hidden feature - you're the boss, win win :) * as for Fairy - I'm not stick to names I can rename it. ThreadLocal like implementation was the quickest solution to drag the principal for almost all of the calcite stack - spring implements such features in the same threadlocal way. Any suggestion how can propagate the principal and I would try to ship it. * System.getProperty - previous implementation of USER/CURRENT_USER/SESSION_USER had used "user.name" property. I've added the fallback to previous behaviour in case an end user has not provided the user on the connection. * boolean indirect and INDIRECT_SELECT was provided to implement my needed case. You map the user to INDIRECT_SELECT in schema now and this user can use such schema only through views from another schema. You can map user to SELECT in schema (but not INDIRECT_SELECT) and such user can use schema only through direct selects and not through views from other schemas. * .gitgnore - will fix What do you think in general about the change? Do you consider to accept the change to calcite after patching you doubts? For me only the "SqlAccessEnum.INDIRECT_SELECT" change is something more than an implementation and it is your call whether to accept another access type or to look to redesign my change. > Ability to hide a schema > > > Key: CALCITE-2194 > URL: https://issues.apache.org/jira/browse/CALCITE-2194 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.16.0 >Reporter: Piotr Bojko >Assignee: Piotr Bojko >Priority: Minor > > See: > [https://mail-archives.apache.org/mod_mbox/calcite-dev/201711.mbox/ajax/%3C6F6E52D4-6860-4384-A1CB-A2301D05394D%40apache.org%3E] > I've looked into the core and the notion of an user could be hard to achieved > now. > Though, I am able to implement the "hidden schema" feature through following > changes: > # JsonSchema - add a holder for the feature, boolean flag or flags field > with enum (CACHED which now exists as a separate flag - some deprecation > could be needed, HIDDEN) > # CalciteSchema - pass through of a flag > # RelOptSchema - pass through of a flag > # CalciteCatalogReader - pass through of a flag > # Other derivatives of RelOptSchema - mocked value, false > # RelOptTable and impl - pass through of a flag > # SqlValidatorImpl - validation whether object from hidden schema is used > (in the same places like validateAccess) > # ViewTableMacro.apply -> Schemas.analyzeView -> > CalcitePrepareImpl.analyzeView ->
[jira] [Commented] (CALCITE-2194) Ability to hide a schema
[ https://issues.apache.org/jira/browse/CALCITE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392110#comment-16392110 ] Julian Hyde commented on CALCITE-2194: -- At a very high level it looks good - the "AuthorizationGuard" and "CalcitePrincipal" classes are exactly as I would have hoped. However I can't comment further until I take the time to grok the implementation. > Ability to hide a schema > > > Key: CALCITE-2194 > URL: https://issues.apache.org/jira/browse/CALCITE-2194 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.16.0 >Reporter: Piotr Bojko >Assignee: Piotr Bojko >Priority: Minor > > See: > [https://mail-archives.apache.org/mod_mbox/calcite-dev/201711.mbox/ajax/%3C6F6E52D4-6860-4384-A1CB-A2301D05394D%40apache.org%3E] > I've looked into the core and the notion of an user could be hard to achieved > now. > Though, I am able to implement the "hidden schema" feature through following > changes: > # JsonSchema - add a holder for the feature, boolean flag or flags field > with enum (CACHED which now exists as a separate flag - some deprecation > could be needed, HIDDEN) > # CalciteSchema - pass through of a flag > # RelOptSchema - pass through of a flag > # CalciteCatalogReader - pass through of a flag > # Other derivatives of RelOptSchema - mocked value, false > # RelOptTable and impl - pass through of a flag > # SqlValidatorImpl - validation whether object from hidden schema is used > (in the same places like validateAccess) > # ViewTableMacro.apply -> Schemas.analyzeView -> > CalcitePrepareImpl.analyzeView -> CalcitePrepareImpl.parse_ -> > CalcitePrepareImpl.CalcitePrepareImpl - this path of execution should build > SqlValidatorImpl which has the check from point 7 disabled- > Such feature could be useful for end users. > If the solution is ok - I can contribute it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2208) MaterializedViewTable.MATERIALIZATION_CONNECTION breaks lex and case sensitivity for end user
[ https://issues.apache.org/jira/browse/CALCITE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392095#comment-16392095 ] Julian Hyde commented on CALCITE-2208: -- The materialized view doesn't belong to the current connection, nor does it necessarily "belong to the schema". The current implementation is quick-and-dirty but it doesn't need to be replaced with a different quick-and-dirty. Materialized views are not exactly the same as views, but I'll remark that for a view, the right thing is to remember the environment where the view was created (e.g. the lexical convention in use, and the access control environment) and apply the same environment when the view is used (i.e. expanded in a query). Maybe for materialized views we need to capture the environment when they were created. MATERIALIZATION_CONNECTION should be a connection factory (perhaps pooling behind the scenes, perhaps not) that can create connections with a particular environment. Perhaps they're not even full-blown connections, just a context in which a query can be prepared. > MaterializedViewTable.MATERIALIZATION_CONNECTION breaks lex and case > sensitivity for end user > - > > Key: CALCITE-2208 > URL: https://issues.apache.org/jira/browse/CALCITE-2208 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.15.0, 1.16.0 >Reporter: Piotr Bojko >Assignee: Julian Hyde >Priority: Major > > MaterializedViewTable.MATERIALIZATION_CONNECTION used for validating views > uses ORACLE lex by default. Calcite expands the view sql to uppercase so when > schemas used in such view sql are used are declared in lowercase - Calcite > does not find needed objects to resolve and validate the view sql. > It does really not work even when end user creates connection with > lex=oracle, but uses uppercase for the names of its tables. > It would be best when MaterializedViewTable.MATERIALIZATION_CONNECTION would > be replaced by connection of an end user or dynamically created connection > with passed lex from end user connection. > Quick and dirty solution is to create > MaterializedViewTable.MATERIALIZATION_CONNECTION with caseSensitive=false; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2194) Ability to hide a schema
[ https://issues.apache.org/jira/browse/CALCITE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392092#comment-16392092 ] Piotr Bojko commented on CALCITE-2194: -- Thx for quick feedback. * reformatting is my bad, will try to fix * periods - to be fixed * RexImpTable - it was a hocus pocus to me, but suddenly the files just blows with errors. Totally forgot to look at it after the poc was done. My bad - will inwestigate this further * Full access instead hidden feature - you're the boss, win win :) * as for Fairy - I'm not stick to names I can rename it. ThreadLocal like implementation was the quickest solution to drag the principal for almost all of the calcite stack - spring implements such features in the same threadlocal way. Any suggestion how can propagate the principal and I would try to ship it. * System.getProperty - previous implementation of USER/CURRENT_USER/SESSION_USER had used "user.name" property. I've added the fallback to previous behaviour in case an end user has not provided the user on the connection. * boolean indirect and INDIRECT_SELECT was provided to implement my needed case. You map the user to INDIRECT_SELECT in schema now and this user can use such schema only through views from another schema. You can map user to SELECT in schema (but not INDIRECT_SELECT) and such user can use schema only through direct selects and not through views from other schemas. * .gitgnore - will fix What do you think in general about the change? Do you consider to accept the change to calcite after patching you doubts? For me only the "SqlAccessEnum.INDIRECT_SELECT" change is something more than an implementation and it is your call whether to accept another access type or to look to redesign my change. > Ability to hide a schema > > > Key: CALCITE-2194 > URL: https://issues.apache.org/jira/browse/CALCITE-2194 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.16.0 >Reporter: Piotr Bojko >Assignee: Piotr Bojko >Priority: Minor > > See: > [https://mail-archives.apache.org/mod_mbox/calcite-dev/201711.mbox/ajax/%3C6F6E52D4-6860-4384-A1CB-A2301D05394D%40apache.org%3E] > I've looked into the core and the notion of an user could be hard to achieved > now. > Though, I am able to implement the "hidden schema" feature through following > changes: > # JsonSchema - add a holder for the feature, boolean flag or flags field > with enum (CACHED which now exists as a separate flag - some deprecation > could be needed, HIDDEN) > # CalciteSchema - pass through of a flag > # RelOptSchema - pass through of a flag > # CalciteCatalogReader - pass through of a flag > # Other derivatives of RelOptSchema - mocked value, false > # RelOptTable and impl - pass through of a flag > # SqlValidatorImpl - validation whether object from hidden schema is used > (in the same places like validateAccess) > # ViewTableMacro.apply -> Schemas.analyzeView -> > CalcitePrepareImpl.analyzeView -> CalcitePrepareImpl.parse_ -> > CalcitePrepareImpl.CalcitePrepareImpl - this path of execution should build > SqlValidatorImpl which has the check from point 7 disabled- > Such feature could be useful for end users. > If the solution is ok - I can contribute it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2202) Aggregate Join Push-down on a Single Side
[ https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392086#comment-16392086 ] Julian Hyde commented on CALCITE-2202: -- Maybe I've been out of academia and in engineering for too long, but I prefer a few well-chosen examples than a formal proof. Some examples stake out the corner cases (e.g. empty group by) and other examples are easily generalized (e.g. if something applies to MIN it applies to MAX and similar functions, and if something that applies to AVG it applies to STDEV and similar functions). Are you claiming that it is better to push down on only one side? (I could see how it would be simpler, therefore better, if you could push down to the left first, then push down to the right.) I contend that the desired result is {code}select sum(e.s * d.c) from (select deptno, sum(sal) as s from emp group by deptno) as e join (select deptno, count(*) as c from dept group by deptno) as d on e.deptno = d.deptno group by e.deptno{code} (I mistakenly omitted the "group by deptno" in the two inner queries last time) and I wonder whether you could get to that by applying your methods. Most aggregate functions we are familiar with have one argument, but would your methods apply to aggregate functions with more than one? Let's consider COUNT and COVAR_POP, both of which can take two arguments, and ask whether they can be pushed down if they have one argument from each side of a join. > Aggregate Join Push-down on a Single Side > - > > Key: CALCITE-2202 > URL: https://issues.apache.org/jira/browse/CALCITE-2202 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: next >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: next > > > While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's > apparent that aggregation can be pushed on on a single side (either side), > and leave the other side non-aggregated, regardless of whether grouping > columns are unique on the other side. My analysis – > [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . > This may be useful when the metadata is insufficient; in any case, we may try > to provide all 3 possible transformations (aggregate on left only; right > only; both sides) to the cost based optimizer, so that the cheapest one can > be chosen based on stats. > Does this make any sense, anybody? If it sounds good, I'll implement it and > offer a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2194) Ability to hide a schema
[ https://issues.apache.org/jira/browse/CALCITE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392066#comment-16392066 ] Julian Hyde commented on CALCITE-2194: -- I did a quick pass; observations: * Quite a lot of re-formatting. House style is '=' at end of line, indent 2 or 4. Please stick to it. * Periods at ends of sentences please. * Why exclude RexImpTable.java from checkstyle? * Why is SqlAccessEnum.INDIRECT_SELECT necessary? * Thanks for implementing full access control rather than just "hidden", which I know was your original preference * You have made RelOptTableImpl no longer immutable * I don't like the name Fairy, and I don't like the general practice of putting config in thread-locals. Are there any alternatives? * There are a couple of places that call {{System.getProperty("user.name")}}. Not appropriate for a server/library. * "boolean indirect" parameter is added to several methods, never explained * Changes to .gitignore are a mess > Ability to hide a schema > > > Key: CALCITE-2194 > URL: https://issues.apache.org/jira/browse/CALCITE-2194 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.16.0 >Reporter: Piotr Bojko >Assignee: Piotr Bojko >Priority: Minor > > See: > [https://mail-archives.apache.org/mod_mbox/calcite-dev/201711.mbox/ajax/%3C6F6E52D4-6860-4384-A1CB-A2301D05394D%40apache.org%3E] > I've looked into the core and the notion of an user could be hard to achieved > now. > Though, I am able to implement the "hidden schema" feature through following > changes: > # JsonSchema - add a holder for the feature, boolean flag or flags field > with enum (CACHED which now exists as a separate flag - some deprecation > could be needed, HIDDEN) > # CalciteSchema - pass through of a flag > # RelOptSchema - pass through of a flag > # CalciteCatalogReader - pass through of a flag > # Other derivatives of RelOptSchema - mocked value, false > # RelOptTable and impl - pass through of a flag > # SqlValidatorImpl - validation whether object from hidden schema is used > (in the same places like validateAccess) > # ViewTableMacro.apply -> Schemas.analyzeView -> > CalcitePrepareImpl.analyzeView -> CalcitePrepareImpl.parse_ -> > CalcitePrepareImpl.CalcitePrepareImpl - this path of execution should build > SqlValidatorImpl which has the check from point 7 disabled- > Such feature could be useful for end users. > If the solution is ok - I can contribute it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2194) Ability to hide a schema
[ https://issues.apache.org/jira/browse/CALCITE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391394#comment-16391394 ] Piotr Bojko commented on CALCITE-2194: -- I've shipped the contribution here https://github.com/apache/calcite/pull/647 > Ability to hide a schema > > > Key: CALCITE-2194 > URL: https://issues.apache.org/jira/browse/CALCITE-2194 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.16.0 >Reporter: Piotr Bojko >Assignee: Piotr Bojko >Priority: Minor > > See: > [https://mail-archives.apache.org/mod_mbox/calcite-dev/201711.mbox/ajax/%3C6F6E52D4-6860-4384-A1CB-A2301D05394D%40apache.org%3E] > I've looked into the core and the notion of an user could be hard to achieved > now. > Though, I am able to implement the "hidden schema" feature through following > changes: > # JsonSchema - add a holder for the feature, boolean flag or flags field > with enum (CACHED which now exists as a separate flag - some deprecation > could be needed, HIDDEN) > # CalciteSchema - pass through of a flag > # RelOptSchema - pass through of a flag > # CalciteCatalogReader - pass through of a flag > # Other derivatives of RelOptSchema - mocked value, false > # RelOptTable and impl - pass through of a flag > # SqlValidatorImpl - validation whether object from hidden schema is used > (in the same places like validateAccess) > # ViewTableMacro.apply -> Schemas.analyzeView -> > CalcitePrepareImpl.analyzeView -> CalcitePrepareImpl.parse_ -> > CalcitePrepareImpl.CalcitePrepareImpl - this path of execution should build > SqlValidatorImpl which has the check from point 7 disabled- > Such feature could be useful for end users. > If the solution is ok - I can contribute it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2208) MaterializedViewTable.MATERIALIZATION_CONNECTION breaks lex and case sensitivity for end user
[ https://issues.apache.org/jira/browse/CALCITE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391309#comment-16391309 ] Piotr Bojko commented on CALCITE-2208: -- See https://github.com/apache/calcite/pull/647 > MaterializedViewTable.MATERIALIZATION_CONNECTION breaks lex and case > sensitivity for end user > - > > Key: CALCITE-2208 > URL: https://issues.apache.org/jira/browse/CALCITE-2208 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.15.0, 1.16.0 >Reporter: Piotr Bojko >Assignee: Julian Hyde >Priority: Major > > MaterializedViewTable.MATERIALIZATION_CONNECTION used for validating views > uses ORACLE lex by default. Calcite expands the view sql to uppercase so when > schemas used in such view sql are used are declared in lowercase - Calcite > does not find needed objects to resolve and validate the view sql. > It does really not work even when end user creates connection with > lex=oracle, but uses uppercase for the names of its tables. > It would be best when MaterializedViewTable.MATERIALIZATION_CONNECTION would > be replaced by connection of an end user or dynamically created connection > with passed lex from end user connection. > Quick and dirty solution is to create > MaterializedViewTable.MATERIALIZATION_CONNECTION with caseSensitive=false; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2202) Aggregate Join Push-down on a Single Side
[ https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391279#comment-16391279 ] Zhong Yu commented on CALCITE-2202: --- Thanks Julian. Your last example actually works in my favor. My argument is that the original query is equivalent to the following two, where aggregate is pushed down on only one side {code:java} select sum(e.s * d.c) from (select deptno, (sal)as s from emp) as e join (select deptno, count(*) as c from dept group by deptno) as d on e.deptno = d.deptno group by e.deptno select sum(e.s * d.c) from (select deptno, sum(sal) as s from emp group by deptno) as e join (select deptno, (1) as c from dept) as d on e.deptno = d.deptno group by e.deptno{code} The "cross-multiplier" effect is still there because the join multiplies the side that doesn't do aggregate. So, I'm probably on the right track. However the paper I wrote is full of holes, obviously done by an amateur:) Please ignore it. I'll try to write up a new one in the weekend. Your reminder of vacuous cases, and null handling (different in group-by and equijoin) are all good points. And I'll focus on a narrower proof that works only on inner equijoin. > Aggregate Join Push-down on a Single Side > - > > Key: CALCITE-2202 > URL: https://issues.apache.org/jira/browse/CALCITE-2202 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: next >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: next > > > While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's > apparent that aggregation can be pushed on on a single side (either side), > and leave the other side non-aggregated, regardless of whether grouping > columns are unique on the other side. My analysis – > [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . > This may be useful when the metadata is insufficient; in any case, we may try > to provide all 3 possible transformations (aggregate on left only; right > only; both sides) to the cost based optimizer, so that the cheapest one can > be chosen based on stats. > Does this make any sense, anybody? If it sounds good, I'll implement it and > offer a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2208) MaterializedViewTable.MATERIALIZATION_CONNECTION breaks lex and case sensitivity for end user
[ https://issues.apache.org/jira/browse/CALCITE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391266#comment-16391266 ] Piotr Bojko commented on CALCITE-2208: -- I am planning to add the mentioned workaround when working on CALCITE-2194 - disabling case sensitivity on MaterializedViewTable.MATERIALIZATION_CONNECTION > MaterializedViewTable.MATERIALIZATION_CONNECTION breaks lex and case > sensitivity for end user > - > > Key: CALCITE-2208 > URL: https://issues.apache.org/jira/browse/CALCITE-2208 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.15.0, 1.16.0 >Reporter: Piotr Bojko >Assignee: Julian Hyde >Priority: Major > > MaterializedViewTable.MATERIALIZATION_CONNECTION used for validating views > uses ORACLE lex by default. Calcite expands the view sql to uppercase so when > schemas used in such view sql are used are declared in lowercase - Calcite > does not find needed objects to resolve and validate the view sql. > It does really not work even when end user creates connection with > lex=oracle, but uses uppercase for the names of its tables. > It would be best when MaterializedViewTable.MATERIALIZATION_CONNECTION would > be replaced by connection of an end user or dynamically created connection > with passed lex from end user connection. > Quick and dirty solution is to create > MaterializedViewTable.MATERIALIZATION_CONNECTION with caseSensitive=false; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2208) MaterializedViewTable.MATERIALIZATION_CONNECTION breaks lex and case sensitivity for end user
[ https://issues.apache.org/jira/browse/CALCITE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Piotr Bojko updated CALCITE-2208: - Issue Type: Bug (was: New Feature) > MaterializedViewTable.MATERIALIZATION_CONNECTION breaks lex and case > sensitivity for end user > - > > Key: CALCITE-2208 > URL: https://issues.apache.org/jira/browse/CALCITE-2208 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.15.0, 1.16.0 >Reporter: Piotr Bojko >Assignee: Julian Hyde >Priority: Major > > MaterializedViewTable.MATERIALIZATION_CONNECTION used for validating views > uses ORACLE lex by default. Calcite expands the view sql to uppercase so when > schemas used in such view sql are used are declared in lowercase - Calcite > does not find needed objects to resolve and validate the view sql. > It does really not work even when end user creates connection with > lex=oracle, but uses uppercase for the names of its tables. > It would be best when MaterializedViewTable.MATERIALIZATION_CONNECTION would > be replaced by connection of an end user or dynamically created connection > with passed lex from end user connection. > Quick and dirty solution is to create > MaterializedViewTable.MATERIALIZATION_CONNECTION with caseSensitive=false; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2208) MaterializedViewTable.MATERIALIZATION_CONNECTION breaks lex and case sensitivity for end user
Piotr Bojko created CALCITE-2208: Summary: MaterializedViewTable.MATERIALIZATION_CONNECTION breaks lex and case sensitivity for end user Key: CALCITE-2208 URL: https://issues.apache.org/jira/browse/CALCITE-2208 Project: Calcite Issue Type: New Feature Components: core Affects Versions: 1.15.0, 1.16.0 Reporter: Piotr Bojko Assignee: Julian Hyde MaterializedViewTable.MATERIALIZATION_CONNECTION used for validating views uses ORACLE lex by default. Calcite expands the view sql to uppercase so when schemas used in such view sql are used are declared in lowercase - Calcite does not find needed objects to resolve and validate the view sql. It does really not work even when end user creates connection with lex=oracle, but uses uppercase for the names of its tables. It would be best when MaterializedViewTable.MATERIALIZATION_CONNECTION would be replaced by connection of an end user or dynamically created connection with passed lex from end user connection. Quick and dirty solution is to create MaterializedViewTable.MATERIALIZATION_CONNECTION with caseSensitive=false; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-971) Add "NEXT n VALUES FOR sequence" expression
[ https://issues.apache.org/jira/browse/CALCITE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391059#comment-16391059 ] zhen wang commented on CALCITE-971: --- this one is done, and could be closed per PHOENIX-2383 > Add "NEXT n VALUES FOR sequence" expression > --- > > Key: CALCITE-971 > URL: https://issues.apache.org/jira/browse/CALCITE-971 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Assignee: Julian Hyde >Priority: Major > Labels: phoenix > > Add "NEXT n VALUES FOR sequence" expression. Allows more than one value to be > grabbed at a time. > Note that this departs from standard SQL. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-1862) StackOverflowException in RelMdUtil.estimateFilteredRows
[ https://issues.apache.org/jira/browse/CALCITE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391001#comment-16391001 ] zhen wang commented on CALCITE-1862: some findings the estimate row count method it self is actually okay. before planner optimization it can derive the correct row count the problem lies in the optimization, as rules gets applied on the relation algebra, the plan becomes deeper and deeper, when the stack overflow, the algebra has grown to a depth > 100 which essentially lead to the stack overflow. haven't located which rule is actually leading to the growth of the relation algebra. > StackOverflowException in RelMdUtil.estimateFilteredRows > > > Key: CALCITE-1862 > URL: https://issues.apache.org/jira/browse/CALCITE-1862 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Assignee: Julian Hyde >Priority: Major > > The query > {code}select * > from ( > select * > from ( > select cast(null as integer) as d > from "scott".emp) > where d is null and d is null) > where d is null;{code} > gives > {noformat} > java.lang.StackOverflowError > > at > > org.apache.calcite.adapter.clone.ArrayTable.getStatistic(ArrayTable.java:76) > > at > > org.apache.calcite.prepare.RelOptTableImpl.getRowCount(RelOptTableImpl.java:224) > > at > > org.apache.calcite.rel.core.TableScan.estimateRowCount(TableScan.java:75) > > at > > org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:206) > > at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) > > at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) > > at > > org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:236) > > at > > org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:71) > > at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) > > at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) > > at > > org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:236) > > at > > org.apache.calcite.rel.metadata.RelMdUtil.estimateFilteredRows(RelMdUtil.java:718) > > at > > org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:123) > > at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) > > at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) > > at > > org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:236) > > at > > org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:71) > > at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) > > at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) > > at > > org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:236) > > at > > org.apache.calcite.rel.metadata.RelMdUtil.estimateFilteredRows(RelMdUtil.java:718){noformat} > For a test case, add the query to misc.iq and run QuidemTest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)