[jira] [Commented] (CALCITE-2040) Create adapter for Apache Arrow
[ https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441784#comment-16441784 ] Laurent Goujon commented on CALCITE-2040: - ARROW-1780 is kind of the opposite/complementary: converting a JDBC resultset into an Arrow batch record. > Create adapter for Apache Arrow > --- > > Key: CALCITE-2040 > URL: https://issues.apache.org/jira/browse/CALCITE-2040 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Assignee: Julian Hyde >Priority: Major > > Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would > allow people to execute SQL statements, via JDBC or ODBC, on data stored in > Arrow in-memory format. > Since Arrow is an in-memory format, it is not as straightforward as reading, > say, CSV files using the file adapter: an Arrow data set does not have a URL. > (Unless we use Arrow's > [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/] > format, or use an in-memory file system such as Alluxio.) So we would need > to devise a way of addressing Arrow data sets. > Also, since Arrow is an extremely efficient format for processing data, it > would also be good to have Arrow as a calling convention. That is, > implementations of relational operators such as Filter, Project, Aggregate in > addition to just TableScan. > Lastly, when we have an Arrow convention, if we build adapters for file > formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in > CALCITE-2025) it would make a lot of sense to translate those formats > directly into Arrow (applying simple projects and filters first if > applicable). Those adapters would belong as a "contrib" module in the Arrow > project better than in Calcite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2262) Allow count(*) to be pushed with other aggregators to Druid Storage Handler.
[ https://issues.apache.org/jira/browse/CALCITE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated CALCITE-2262: Fix Version/s: 1.16.1 > Allow count(*) to be pushed with other aggregators to Druid Storage Handler. > > > Key: CALCITE-2262 > URL: https://issues.apache.org/jira/browse/CALCITE-2262 > Project: Calcite > Issue Type: Bug > Components: druid >Reporter: slim bouguerra >Assignee: slim bouguerra >Priority: Major > Labels: improvement > Fix For: 1.16.1 > > > Currently only {code}select count(*) from druid_table {code} is pushed as > Timeseries. > The goal of this patch is to allow the push of more complicated queries like > {code} select count(*), sum(metric) from table {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2262) Allow count(*) to be pushed with other aggregators to Druid Storage Handler.
[ https://issues.apache.org/jira/browse/CALCITE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated CALCITE-2262: Component/s: druid > Allow count(*) to be pushed with other aggregators to Druid Storage Handler. > > > Key: CALCITE-2262 > URL: https://issues.apache.org/jira/browse/CALCITE-2262 > Project: Calcite > Issue Type: Bug > Components: druid >Reporter: slim bouguerra >Assignee: slim bouguerra >Priority: Major > Labels: improvement > > Currently only {code}select count(*) from druid_table {code} is pushed as > Timeseries. > The goal of this patch is to allow the push of more complicated queries like > {code} select count(*), sum(metric) from table {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2262) Allow count(*) to be pushed with other aggregators to Druid Storage Handler.
[ https://issues.apache.org/jira/browse/CALCITE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated CALCITE-2262: Labels: improvement (was: ) > Allow count(*) to be pushed with other aggregators to Druid Storage Handler. > > > Key: CALCITE-2262 > URL: https://issues.apache.org/jira/browse/CALCITE-2262 > Project: Calcite > Issue Type: Bug > Components: druid >Reporter: slim bouguerra >Assignee: slim bouguerra >Priority: Major > Labels: improvement > > Currently only {code}select count(*) from druid_table {code} is pushed as > Timeseries. > The goal of this patch is to allow the push of more complicated queries like > {code} select count(*), sum(metric) from table {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2262) Allow count(*) to be pushed with other aggregators to Druid Storage Handler.
[ https://issues.apache.org/jira/browse/CALCITE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated CALCITE-2262: Description: Currently only {code}select count(*) from druid_table {code} is pushed as Timeseries. The goal of this patch is to allow the push of more complicated queries like {code} select count(*), sum(metric) from table {code} was: Currently only \{code}select count(*) from druid_table \{code} is pushed as Timeseries. The goal of this patch is to allow the push of more complicated queries like {code} select count(*), sum(metric) from table \{code} > Allow count(*) to be pushed with other aggregators to Druid Storage Handler. > > > Key: CALCITE-2262 > URL: https://issues.apache.org/jira/browse/CALCITE-2262 > Project: Calcite > Issue Type: Bug >Reporter: slim bouguerra >Assignee: slim bouguerra >Priority: Major > > Currently only {code}select count(*) from druid_table {code} is pushed as > Timeseries. > The goal of this patch is to allow the push of more complicated queries like > {code} select count(*), sum(metric) from table {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2262) Allow count(*) to be pushed with other aggregators to Druid Storage Handler.
slim bouguerra created CALCITE-2262: --- Summary: Allow count(*) to be pushed with other aggregators to Druid Storage Handler. Key: CALCITE-2262 URL: https://issues.apache.org/jira/browse/CALCITE-2262 Project: Calcite Issue Type: Bug Reporter: slim bouguerra Assignee: slim bouguerra Currently only \{code}select count(*) from druid_table \{code} is pushed as Timeseries. The goal of this patch is to allow the push of more complicated queries like {code} select count(*), sum(metric) from table \{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2168) Implement a General Purpose Benchmark for Calcite
[ https://issues.apache.org/jira/browse/CALCITE-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441031#comment-16441031 ] Seung-Hwan Lim edited comment on CALCITE-2168 at 4/17/18 3:35 PM: -- While writing TPC-DS queries for Calcite with Postgres backends, I have found couple of issues. 1. date time interval compatibility: postgres' dialect is ``` (cast('2000-08-20' as date) + interval '30 days') ```. For Calcite with postgres backend , when I tried following: (cast('2000-08-20' as date) + interval '30' day ) I have UnsupportedOperation Exception: Caused by: java.lang.UnsupportedOperationException: class org.apache.calcite.sql.SqlSyntax$6: SPECIAL 2. nested aggregation with windows function. in TPC-DS query 98, we have following troublesome phrase: ```sum(ss."ss_ext_sales_price")*100/sum(sum(ss."ss_ext_sales_price")) over (partition by i."i_class") as REVENUERATIO``` Which generates: SUM("t"."ss_ext_sales_price") * 100 / CASE WHEN (COUNT(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) > 0 THEN CAST($SUM0(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS DECIMAL(7, 2)) ELSE NULL END AS "REVENUERATIO" It causes syntax error in CAST($SUM0(SUM())) part in postgresql. I'm testing TPC-DS with the version of 1.16. Thank you, was (Author: lims1): While writing TPC-DS queries for Calcite with Postgres backends, I have found couple of issues. 1. date time interval compatibility: postgres' dialect is ``` (cast('2000-08-20' as date) + interval '30 days') ```. For Calcite with postgres backend , I tried following: (cast('2000-08-20' as date) + interval '30' day ) I have UnsupportedOperation Exception: Caused by: java.lang.UnsupportedOperationException: class org.apache.calcite.sql.SqlSyntax$6: SPECIAL 2. nested aggregation with windows function. in TPC-DS query 98, we have following troublesome phrase: ```sum(ss."ss_ext_sales_price")*100/sum(sum(ss."ss_ext_sales_price")) over (partition by i."i_class") as REVENUERATIO``` Which generates: SUM("t"."ss_ext_sales_price") * 100 / CASE WHEN (COUNT(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) > 0 THEN CAST($SUM0(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS DECIMAL(7, 2)) ELSE NULL END AS "REVENUERATIO" It causes syntax error in CAST($SUM0(SUM())) part in postgresql. I'm testing TPC-DS with the version of 1.16. Thank you, > Implement a General Purpose Benchmark for Calcite > -- > > Key: CALCITE-2168 > URL: https://issues.apache.org/jira/browse/CALCITE-2168 > Project: Calcite > Issue Type: Wish > Components: core >Reporter: Edmon Begoli >Assignee: Edmon Begoli >Priority: Minor > Labels: performance > Original Estimate: 2,688h > Remaining Estimate: 2,688h > > Develop a benchmark that can be used for general purpose benchamrking of > Calcite against other frameworks, and databases, and for study,research, and > profiling of the framwork. > Use popular benchmarks such as TCP-DS (or -H) or Star Schema Benchmark (SSB) > and measure the performance of optimized vs. unoptimized Calcite queries, and > the overhead of going through Calcite adapters vs. natively accessing the > target DB > Look into the existing approaches and do perhaps something similar: > * https://www.slideshare.net/julianhyde/w-435phyde-3 > * > https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_hive-performance-tuning/content/ch_cost-based-optimizer.html > * (How much of this is still relevant (Hive 0.14)? Can we use > queries/benchmarks?) > https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/ > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2168) Implement a General Purpose Benchmark for Calcite
[ https://issues.apache.org/jira/browse/CALCITE-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441031#comment-16441031 ] Seung-Hwan Lim commented on CALCITE-2168: - While writing TPC-DS queries for Calcite with Postgres backends, I have found couple of issues. 1. date time interval compatibility: postgres' dialect is ``` (cast('2000-08-20' as date) + interval '30 days') ```. For Calcite with postgres backend , I tried following: (cast('2000-08-20' as date) + interval '30' day ) I have UnsupportedOperation Exception: Caused by: java.lang.UnsupportedOperationException: class org.apache.calcite.sql.SqlSyntax$6: SPECIAL 2. nested aggregation with windows function. in TPC-DS query 98, we have following troublesome phrase: ```sum(ss."ss_ext_sales_price")*100/sum(sum(ss."ss_ext_sales_price")) over (partition by i."i_class") as REVENUERATIO``` Which generates: SUM("t"."ss_ext_sales_price") * 100 / CASE WHEN (COUNT(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) > 0 THEN CAST($SUM0(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS DECIMAL(7, 2)) ELSE NULL END AS "REVENUERATIO" It causes syntax error in CAST($SUM0(SUM())) part in postgresql. I'm testing TPC-DS with the version of 1.16. Thank you, > Implement a General Purpose Benchmark for Calcite > -- > > Key: CALCITE-2168 > URL: https://issues.apache.org/jira/browse/CALCITE-2168 > Project: Calcite > Issue Type: Wish > Components: core >Reporter: Edmon Begoli >Assignee: Edmon Begoli >Priority: Minor > Labels: performance > Original Estimate: 2,688h > Remaining Estimate: 2,688h > > Develop a benchmark that can be used for general purpose benchamrking of > Calcite against other frameworks, and databases, and for study,research, and > profiling of the framwork. > Use popular benchmarks such as TCP-DS (or -H) or Star Schema Benchmark (SSB) > and measure the performance of optimized vs. unoptimized Calcite queries, and > the overhead of going through Calcite adapters vs. natively accessing the > target DB > Look into the existing approaches and do perhaps something similar: > * https://www.slideshare.net/julianhyde/w-435phyde-3 > * > https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_hive-performance-tuning/content/ch_cost-based-optimizer.html > * (How much of this is still relevant (Hive 0.14)? Can we use > queries/benchmarks?) > https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/ > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2261) Switch calcite-core to JDK8
[ https://issues.apache.org/jira/browse/CALCITE-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Olivelli updated CALCITE-2261: - Summary: Switch calcite-core to JDK8 (was: Switch calcilte-core to JDK8) > Switch calcite-core to JDK8 > --- > > Key: CALCITE-2261 > URL: https://issues.apache.org/jira/browse/CALCITE-2261 > Project: Calcite > Issue Type: Improvement > Components: build >Affects Versions: 1.16.0 >Reporter: Enrico Olivelli >Assignee: Julian Hyde >Priority: Major > Fix For: 1.17.0 > > > Currently (1.16) Calcilte core is compiled for JDK 1.7. > Just switching maven-compiler-plugin to 1.8 is not enough because of a bug of > Janino > [https://github.com/janino-compiler/janino/issues/47] > reported by Vova > > As a workaround to that bug we have to add a default method implementation > for SchemaPlus#getSubSchema > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2261) Switch calcilte-core to JDK8
Enrico Olivelli created CALCITE-2261: Summary: Switch calcilte-core to JDK8 Key: CALCITE-2261 URL: https://issues.apache.org/jira/browse/CALCITE-2261 Project: Calcite Issue Type: Improvement Components: build Affects Versions: 1.16.0 Reporter: Enrico Olivelli Assignee: Julian Hyde Fix For: 1.17.0 Currently (1.16) Calcilte core is compiled for JDK 1.7. Just switching maven-compiler-plugin to 1.8 is not enough because of a bug of Janino [https://github.com/janino-compiler/janino/issues/47] reported by Vova As a workaround to that bug we have to add a default method implementation for SchemaPlus#getSubSchema -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CALCITE-2063) Add JDK 10 to CI
[ https://issues.apache.org/jira/browse/CALCITE-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser resolved CALCITE-2063. - Resolution: Fixed Fixed in [https://git-wip-us.apache.org/repos/asf?p=calcite.git;a=commit;h=9085b601081689b5b7f1e9f57deb20e2229910cb.] Thanks Kevin! > Add JDK 10 to CI > > > Key: CALCITE-2063 > URL: https://issues.apache.org/jira/browse/CALCITE-2063 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Assignee: Julian Hyde >Priority: Major > Fix For: 1.17.0 > > > In CALCITE-2058 we added support for JDK 10 (early access build), and we test > using a cron job on Julian's server but currently Apache's Jenkins does not > support JDK 10. This task is to enable JDK 10 tests when Jenkins supports it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)