[jira] [Commented] (CALCITE-2040) Create adapter for Apache Arrow

2018-04-17 Thread Laurent Goujon (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441784#comment-16441784
 ] 

Laurent Goujon commented on CALCITE-2040:
-

ARROW-1780 is kind of the opposite/complementary: converting a JDBC resultset 
into an Arrow batch record.

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2262) Allow count(*) to be pushed with other aggregators to Druid Storage Handler.

2018-04-17 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated CALCITE-2262:

Fix Version/s: 1.16.1

> Allow count(*) to be pushed with other aggregators to Druid Storage Handler.
> 
>
> Key: CALCITE-2262
> URL: https://issues.apache.org/jira/browse/CALCITE-2262
> Project: Calcite
>  Issue Type: Bug
>  Components: druid
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>  Labels: improvement
> Fix For: 1.16.1
>
>
> Currently only {code}select count(*) from druid_table {code} is pushed as 
> Timeseries.
> The goal of this patch is to allow the push of more complicated queries like 
> {code} select count(*), sum(metric) from table {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2262) Allow count(*) to be pushed with other aggregators to Druid Storage Handler.

2018-04-17 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated CALCITE-2262:

Component/s: druid

> Allow count(*) to be pushed with other aggregators to Druid Storage Handler.
> 
>
> Key: CALCITE-2262
> URL: https://issues.apache.org/jira/browse/CALCITE-2262
> Project: Calcite
>  Issue Type: Bug
>  Components: druid
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>  Labels: improvement
>
> Currently only {code}select count(*) from druid_table {code} is pushed as 
> Timeseries.
> The goal of this patch is to allow the push of more complicated queries like 
> {code} select count(*), sum(metric) from table {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2262) Allow count(*) to be pushed with other aggregators to Druid Storage Handler.

2018-04-17 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated CALCITE-2262:

Labels: improvement  (was: )

> Allow count(*) to be pushed with other aggregators to Druid Storage Handler.
> 
>
> Key: CALCITE-2262
> URL: https://issues.apache.org/jira/browse/CALCITE-2262
> Project: Calcite
>  Issue Type: Bug
>  Components: druid
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>  Labels: improvement
>
> Currently only {code}select count(*) from druid_table {code} is pushed as 
> Timeseries.
> The goal of this patch is to allow the push of more complicated queries like 
> {code} select count(*), sum(metric) from table {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2262) Allow count(*) to be pushed with other aggregators to Druid Storage Handler.

2018-04-17 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated CALCITE-2262:

Description: 
Currently only {code}select count(*) from druid_table {code} is pushed as 
Timeseries.

The goal of this patch is to allow the push of more complicated queries like 

{code} select count(*), sum(metric) from table {code}

 

  was:
Currently only \{code}select count(*) from druid_table \{code} is pushed as 
Timeseries.

The goal of this patch is to allow the push of more complicated queries like 

{code} select count(*), sum(metric) from table \{code}

 


> Allow count(*) to be pushed with other aggregators to Druid Storage Handler.
> 
>
> Key: CALCITE-2262
> URL: https://issues.apache.org/jira/browse/CALCITE-2262
> Project: Calcite
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>
> Currently only {code}select count(*) from druid_table {code} is pushed as 
> Timeseries.
> The goal of this patch is to allow the push of more complicated queries like 
> {code} select count(*), sum(metric) from table {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-2262) Allow count(*) to be pushed with other aggregators to Druid Storage Handler.

2018-04-17 Thread slim bouguerra (JIRA)
slim bouguerra created CALCITE-2262:
---

 Summary: Allow count(*) to be pushed with other aggregators to 
Druid Storage Handler.
 Key: CALCITE-2262
 URL: https://issues.apache.org/jira/browse/CALCITE-2262
 Project: Calcite
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently only \{code}select count(*) from druid_table \{code} is pushed as 
Timeseries.

The goal of this patch is to allow the push of more complicated queries like 

{code} select count(*), sum(metric) from table \{code}

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2168) Implement a General Purpose Benchmark for Calcite

2018-04-17 Thread Seung-Hwan Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441031#comment-16441031
 ] 

Seung-Hwan Lim edited comment on CALCITE-2168 at 4/17/18 3:35 PM:
--

While writing TPC-DS queries for Calcite with Postgres backends, I have found 
couple of issues.

1. date time interval compatibility: postgres' dialect is 

``` (cast('2000-08-20' as date) +  interval '30 days') ```. 

 

For Calcite with postgres backend , when I tried following:

(cast('2000-08-20' as date) +  interval '30' day )

I have UnsupportedOperation Exception: 

Caused by: java.lang.UnsupportedOperationException: class 
org.apache.calcite.sql.SqlSyntax$6: SPECIAL

 

2. nested aggregation with windows function.

in TPC-DS query 98, we have following troublesome phrase:

```sum(ss."ss_ext_sales_price")*100/sum(sum(ss."ss_ext_sales_price")) over

          (partition by i."i_class") as REVENUERATIO```

Which generates:

SUM("t"."ss_ext_sales_price") * 100 / CASE WHEN 
(COUNT(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE 
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) > 0 THEN 
CAST($SUM0(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" 
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS DECIMAL(7, 2)) 
ELSE NULL END AS "REVENUERATIO"

 

It causes syntax error in CAST($SUM0(SUM())) part in postgresql.

 

 

I'm testing TPC-DS with the version of 1.16.

 

Thank you,


was (Author: lims1):
While writing TPC-DS queries for Calcite with Postgres backends, I have found 
couple of issues.

1. date time interval compatibility: postgres' dialect is 

``` (cast('2000-08-20' as date) +  interval '30 days') ```. 

 

For Calcite with postgres backend , I tried following:

(cast('2000-08-20' as date) +  interval '30' day )

I have UnsupportedOperation Exception: 

Caused by: java.lang.UnsupportedOperationException: class 
org.apache.calcite.sql.SqlSyntax$6: SPECIAL

 

2. nested aggregation with windows function.

in TPC-DS query 98, we have following troublesome phrase:

```sum(ss."ss_ext_sales_price")*100/sum(sum(ss."ss_ext_sales_price")) over

          (partition by i."i_class") as REVENUERATIO```

Which generates:

SUM("t"."ss_ext_sales_price") * 100 / CASE WHEN 
(COUNT(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE 
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) > 0 THEN 
CAST($SUM0(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" 
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS DECIMAL(7, 2)) 
ELSE NULL END AS "REVENUERATIO"

 

It causes syntax error in CAST($SUM0(SUM())) part in postgresql.

 

 

I'm testing TPC-DS with the version of 1.16.

 

Thank you,

> Implement a General Purpose Benchmark for Calcite 
> --
>
> Key: CALCITE-2168
> URL: https://issues.apache.org/jira/browse/CALCITE-2168
> Project: Calcite
>  Issue Type: Wish
>  Components: core
>Reporter: Edmon Begoli
>Assignee: Edmon Begoli
>Priority: Minor
>  Labels: performance
>   Original Estimate: 2,688h
>  Remaining Estimate: 2,688h
>
> Develop a benchmark that can be used for general purpose benchamrking of 
> Calcite against other frameworks, and databases, and for study,research, and 
> profiling of the framwork.
> Use popular benchmarks such as TCP-DS (or -H) or Star Schema Benchmark (SSB) 
> and measure the performance of optimized vs. unoptimized Calcite queries, and 
> the overhead of going through Calcite adapters vs. natively accessing the 
> target DB
> Look into the existing approaches and do perhaps something similar:
> * https://www.slideshare.net/julianhyde/w-435phyde-3
> * 
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_hive-performance-tuning/content/ch_cost-based-optimizer.html
> * (How much of this is still relevant (Hive 0.14)? Can we use 
> queries/benchmarks?)
> https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2168) Implement a General Purpose Benchmark for Calcite

2018-04-17 Thread Seung-Hwan Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441031#comment-16441031
 ] 

Seung-Hwan Lim commented on CALCITE-2168:
-

While writing TPC-DS queries for Calcite with Postgres backends, I have found 
couple of issues.

1. date time interval compatibility: postgres' dialect is 

``` (cast('2000-08-20' as date) +  interval '30 days') ```. 

 

For Calcite with postgres backend , I tried following:

(cast('2000-08-20' as date) +  interval '30' day )

I have UnsupportedOperation Exception: 

Caused by: java.lang.UnsupportedOperationException: class 
org.apache.calcite.sql.SqlSyntax$6: SPECIAL

 

2. nested aggregation with windows function.

in TPC-DS query 98, we have following troublesome phrase:

```sum(ss."ss_ext_sales_price")*100/sum(sum(ss."ss_ext_sales_price")) over

          (partition by i."i_class") as REVENUERATIO```

Which generates:

SUM("t"."ss_ext_sales_price") * 100 / CASE WHEN 
(COUNT(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE 
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) > 0 THEN 
CAST($SUM0(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" 
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS DECIMAL(7, 2)) 
ELSE NULL END AS "REVENUERATIO"

 

It causes syntax error in CAST($SUM0(SUM())) part in postgresql.

 

 

I'm testing TPC-DS with the version of 1.16.

 

Thank you,

> Implement a General Purpose Benchmark for Calcite 
> --
>
> Key: CALCITE-2168
> URL: https://issues.apache.org/jira/browse/CALCITE-2168
> Project: Calcite
>  Issue Type: Wish
>  Components: core
>Reporter: Edmon Begoli
>Assignee: Edmon Begoli
>Priority: Minor
>  Labels: performance
>   Original Estimate: 2,688h
>  Remaining Estimate: 2,688h
>
> Develop a benchmark that can be used for general purpose benchamrking of 
> Calcite against other frameworks, and databases, and for study,research, and 
> profiling of the framwork.
> Use popular benchmarks such as TCP-DS (or -H) or Star Schema Benchmark (SSB) 
> and measure the performance of optimized vs. unoptimized Calcite queries, and 
> the overhead of going through Calcite adapters vs. natively accessing the 
> target DB
> Look into the existing approaches and do perhaps something similar:
> * https://www.slideshare.net/julianhyde/w-435phyde-3
> * 
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_hive-performance-tuning/content/ch_cost-based-optimizer.html
> * (How much of this is still relevant (Hive 0.14)? Can we use 
> queries/benchmarks?)
> https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2261) Switch calcite-core to JDK8

2018-04-17 Thread Enrico Olivelli (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrico Olivelli updated CALCITE-2261:
-
Summary: Switch calcite-core to JDK8  (was: Switch calcilte-core to JDK8)

> Switch calcite-core to JDK8
> ---
>
> Key: CALCITE-2261
> URL: https://issues.apache.org/jira/browse/CALCITE-2261
> Project: Calcite
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.16.0
>Reporter: Enrico Olivelli
>Assignee: Julian Hyde
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently (1.16) Calcilte core is compiled for JDK 1.7.
> Just switching maven-compiler-plugin to 1.8 is not enough because of a bug of 
> Janino
> [https://github.com/janino-compiler/janino/issues/47]
> reported by Vova
>  
> As a workaround to that bug we have to add a default method implementation 
> for SchemaPlus#getSubSchema
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-2261) Switch calcilte-core to JDK8

2018-04-17 Thread Enrico Olivelli (JIRA)
Enrico Olivelli created CALCITE-2261:


 Summary: Switch calcilte-core to JDK8
 Key: CALCITE-2261
 URL: https://issues.apache.org/jira/browse/CALCITE-2261
 Project: Calcite
  Issue Type: Improvement
  Components: build
Affects Versions: 1.16.0
Reporter: Enrico Olivelli
Assignee: Julian Hyde
 Fix For: 1.17.0


Currently (1.16) Calcilte core is compiled for JDK 1.7.

Just switching maven-compiler-plugin to 1.8 is not enough because of a bug of 
Janino

[https://github.com/janino-compiler/janino/issues/47]

reported by Vova

 

As a workaround to that bug we have to add a default method implementation for 
SchemaPlus#getSubSchema

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CALCITE-2063) Add JDK 10 to CI

2018-04-17 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved CALCITE-2063.
-
Resolution: Fixed

Fixed in 
[https://git-wip-us.apache.org/repos/asf?p=calcite.git;a=commit;h=9085b601081689b5b7f1e9f57deb20e2229910cb.]
 Thanks Kevin!

> Add JDK 10 to CI
> 
>
> Key: CALCITE-2063
> URL: https://issues.apache.org/jira/browse/CALCITE-2063
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
> Fix For: 1.17.0
>
>
> In CALCITE-2058 we added support for JDK 10 (early access build), and we test 
> using a cron job on Julian's server but currently Apache's Jenkins does not 
> support JDK 10. This task is to enable JDK 10 tests when Jenkins supports it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)