[jira] [Updated] (DRILL-6540) Upgrade to HADOOP-3.1 libraries

2018-09-06 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6540:
---
Description: 
Currently Drill uses 2.7.1 version of hadoop libraries (hadoop-common, 
hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
hadoop-yarn-client).
 Half of year ago the [Hadoop 
3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and recently 
it was an update - [Hadoop 3.1|https://hadoop.apache.org/docs/r3.1.0/].

To use Drill under Hadoop3.0 distribution we need this upgrade. Also the newer 
version includes new features, which can be useful for Drill.
 This upgrade is also needed to leverage the newest version of Zookeeper 
libraries and Hive 3.1 version.

  was:
Currently Drill uses 2.7.1 version of hadoop libraries (hadoop-common, 
hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
hadoop-yarn-client).
Half of year ago the [Hadoop 
3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and recently 
it was an update - [Hadoop 3.1|https://hadoop.apache.org/docs/r3.1.0/]. 

To use Drill under Hadoop3.0 distribution we need this upgrade. Also the newer 
version includes new features, which can be useful for Drill.
This upgrade is also needed to leverage the newest version of zookeeper 
libraries.


> Upgrade to HADOOP-3.1 libraries 
> 
>
> Key: DRILL-6540
> URL: https://issues.apache.org/jira/browse/DRILL-6540
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: Future
>
>
> Currently Drill uses 2.7.1 version of hadoop libraries (hadoop-common, 
> hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
> hadoop-yarn-client).
>  Half of year ago the [Hadoop 
> 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and 
> recently it was an update - [Hadoop 
> 3.1|https://hadoop.apache.org/docs/r3.1.0/].
> To use Drill under Hadoop3.0 distribution we need this upgrade. Also the 
> newer version includes new features, which can be useful for Drill.
>  This upgrade is also needed to leverage the newest version of Zookeeper 
> libraries and Hive 3.1 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6540) Upgrade to HADOOP-3.1 libraries

2018-09-06 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-6540:
--

Assignee: Vitalii Diravka

> Upgrade to HADOOP-3.1 libraries 
> 
>
> Key: DRILL-6540
> URL: https://issues.apache.org/jira/browse/DRILL-6540
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: Future
>
>
> Currently Drill uses 2.7.1 version of hadoop libraries (hadoop-common, 
> hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
> hadoop-yarn-client).
> Half of year ago the [Hadoop 
> 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and 
> recently it was an update - [Hadoop 
> 3.1|https://hadoop.apache.org/docs/r3.1.0/]. 
> To use Drill under Hadoop3.0 distribution we need this upgrade. Also the 
> newer version includes new features, which can be useful for Drill.
> This upgrade is also needed to leverage the newest version of zookeeper 
> libraries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6540) Upgrade to HADOOP-3.1 libraries

2018-09-06 Thread Vitalii Diravka (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606306#comment-16606306
 ] 

Vitalii Diravka commented on DRILL-6540:


It requires proper Drill's dependency on loggers, currently a lot of logger 
transitive dependencies are used, for example {{org.slf4j.LoggerFactory}} is 
used from avatica-1.12.0 - resolved.

Also it requires upgrade {{jetty.version}} from 9.1 to at least 9.3 version (it 
is used in setting up {{hdfs.MiniDFSCluster}} for Drill unit tests) - resolved.

It requires {{commons-logging}}, which is currently banned in Drill. For some 
reason {{Log4JLogger}} is absent in {{jcl-over-slf4j}} - still the issue.

Adapting the code with 2.7-mapr version - resolved.

Work in progress branch - https://github.com/vdiravka/drill/commits/DRILL-6540

> Upgrade to HADOOP-3.1 libraries 
> 
>
> Key: DRILL-6540
> URL: https://issues.apache.org/jira/browse/DRILL-6540
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vitalii Diravka
>Priority: Major
> Fix For: Future
>
>
> Currently Drill uses 2.7.1 version of hadoop libraries (hadoop-common, 
> hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
> hadoop-yarn-client).
> Half of year ago the [Hadoop 
> 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and 
> recently it was an update - [Hadoop 
> 3.1|https://hadoop.apache.org/docs/r3.1.0/]. 
> To use Drill under Hadoop3.0 distribution we need this upgrade. Also the 
> newer version includes new features, which can be useful for Drill.
> This upgrade is also needed to leverage the newest version of zookeeper 
> libraries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6454) Native MapR DB plugin support for Hive MapR-DB json table

2018-09-06 Thread Anton Gozhiy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy closed DRILL-6454.
---

Verified with Drill version 1.15.0-SNAPSHOT (commit 
fa0d78d16eaf35d30d95613913a5613b2a82280d)

> Native MapR DB plugin support for Hive MapR-DB json table
> -
>
> Key: DRILL-6454
> URL: https://issues.apache.org/jira/browse/DRILL-6454
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.14.0
>
>
> Hive can create and query MapR-DB tables via maprdb-json-handler:
>  [https://maprdocs.mapr.com/home/Hive/ConnectingToMapR-DB.html]
> The aim of this Jira to implement Drill native reader for Hive MapR-DB tables 
> (similar to parquet).
> Design proposal is:
>  - to use JsonTableGroupScan instead of HiveScan;
>  - to add storage planning rule to convert HiveScan to MapRDBGroupScan;
>  - to add system/session option to enable using of this native reader;
>  - native reader can be used only for Drill build with mapr profile (there is 
> no reason to leverage it for default profile);
>  
> *For documentation:*
> two new options were added:
> store.hive.parquet.optimize_scan_with_native_reader: false,
> store.hive.maprdb_json.optimize_scan_with_native_reader: false,
> store.hive.parquet.optimize_scan_with_native_reader is new option used 
> instead of store.hive.optimize_scan_with_native_readers. The latter is 
> deprecated and will be removed in 1.15.
> (https://issues.apache.org/jira/browse/DRILL-6527).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-4912) Ability to use alias in join conditions

2018-09-06 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-4912.
---
Resolution: Invalid

> Ability to use alias in join conditions
> ---
>
> Key: DRILL-4912
> URL: https://issues.apache.org/jira/browse/DRILL-4912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow, SQL Parser
>Affects Versions: 1.6.0, 1.8.0
>Reporter: Kathiresan Selvaraj
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
>
> Json files used for testing:
> data.json
> { "name": "Jim","city" : [1,2]}
> cities.json
> {id:1,name:"Sendurai"}
> {id:2,name:"NYC"}
> Below query returns no result.
> select city\[0\] as cityalias from dfs.tmp.`data.json` a join (select id as 
> idalias from dfs.tmp.`cities.json`) b on a.cityalias  = b.idalias 
> However, the query below works fine
> select city\[0\] as cityalias from dfs.tmp.`data.json` a join (select id as 
> idalias from dfs.tmp.`cities.json`) b on a.city\[0\]  = b.idalias 
> Using an alias for city\[0\] in the join condition makes it return no result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4912) Ability to use alias in join conditions

2018-09-06 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605742#comment-16605742
 ] 

Arina Ielchiieva commented on DRILL-4912:
-

Most of the databases do not support such syntax (Oracle, Postgre, SqlServer, 
MySql) as well as it's not implemented in Calcite. As workaround subselect can 
be used.

> Ability to use alias in join conditions
> ---
>
> Key: DRILL-4912
> URL: https://issues.apache.org/jira/browse/DRILL-4912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow, SQL Parser
>Affects Versions: 1.6.0, 1.8.0
>Reporter: Kathiresan Selvaraj
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
>
> Json files used for testing:
> data.json
> { "name": "Jim","city" : [1,2]}
> cities.json
> {id:1,name:"Sendurai"}
> {id:2,name:"NYC"}
> Below query returns no result.
> select city\[0\] as cityalias from dfs.tmp.`data.json` a join (select id as 
> idalias from dfs.tmp.`cities.json`) b on a.cityalias  = b.idalias 
> However, the query below works fine
> select city\[0\] as cityalias from dfs.tmp.`data.json` a join (select id as 
> idalias from dfs.tmp.`cities.json`) b on a.city\[0\]  = b.idalias 
> Using an alias for city\[0\] in the join condition makes it return no result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4912) Ability to use alias in join conditions

2018-09-06 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4912:

Fix Version/s: (was: 1.15.0)

> Ability to use alias in join conditions
> ---
>
> Key: DRILL-4912
> URL: https://issues.apache.org/jira/browse/DRILL-4912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow, SQL Parser
>Affects Versions: 1.6.0, 1.8.0
>Reporter: Kathiresan Selvaraj
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
>
> Json files used for testing:
> data.json
> { "name": "Jim","city" : [1,2]}
> cities.json
> {id:1,name:"Sendurai"}
> {id:2,name:"NYC"}
> Below query returns no result.
> select city\[0\] as cityalias from dfs.tmp.`data.json` a join (select id as 
> idalias from dfs.tmp.`cities.json`) b on a.cityalias  = b.idalias 
> However, the query below works fine
> select city\[0\] as cityalias from dfs.tmp.`data.json` a join (select id as 
> idalias from dfs.tmp.`cities.json`) b on a.city\[0\]  = b.idalias 
> Using an alias for city\[0\] in the join condition makes it return no result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-1248) Add support for using aliases in group by

2018-09-06 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-1248:

Labels: doc-impacting  (was: )

> Add support for using aliases in group by
> -
>
> Key: DRILL-1248
> URL: https://issues.apache.org/jira/browse/DRILL-1248
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> when I select using a function and alias the resultant function value it 
> won't parse properly saying the alias is ambiguous. I know that this is a 
> debatable / questionable topic, but with this engine being so flexible it 
> seems that in order to support all of the formatting, casting, etc.. that 
> will likely occur having the group by support an alias would be a big deal. 
> This in my opinion is nothing like an ordinal group by. 
> This works:
> select extract(year from to_date(crimes.datetime, 'MM/DD/ hh:mm:ss a')) 
> from BLAH group by extract(year from to_date(crimes.datetime, 'MM/DD/ 
> hh:mm:ss a'));
> This doesn't:
> select extract(year from to_date(crimes.datetime, 'MM/DD/ hh:mm:ss a')) 
> as mygroup from BLAH group by mygroup



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-1248) Add support for using aliases in group by

2018-09-06 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-1248:

Fix Version/s: (was: Future)
   1.15.0

> Add support for using aliases in group by
> -
>
> Key: DRILL-1248
> URL: https://issues.apache.org/jira/browse/DRILL-1248
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> when I select using a function and alias the resultant function value it 
> won't parse properly saying the alias is ambiguous. I know that this is a 
> debatable / questionable topic, but with this engine being so flexible it 
> seems that in order to support all of the formatting, casting, etc.. that 
> will likely occur having the group by support an alias would be a big deal. 
> This in my opinion is nothing like an ordinal group by. 
> This works:
> select extract(year from to_date(crimes.datetime, 'MM/DD/ hh:mm:ss a')) 
> from BLAH group by extract(year from to_date(crimes.datetime, 'MM/DD/ 
> hh:mm:ss a'));
> This doesn't:
> select extract(year from to_date(crimes.datetime, 'MM/DD/ hh:mm:ss a')) 
> as mygroup from BLAH group by mygroup



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-1248) Add support for using aliases in group by

2018-09-06 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-1248:
---

Assignee: Arina Ielchiieva

> Add support for using aliases in group by
> -
>
> Key: DRILL-1248
> URL: https://issues.apache.org/jira/browse/DRILL-1248
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> when I select using a function and alias the resultant function value it 
> won't parse properly saying the alias is ambiguous. I know that this is a 
> debatable / questionable topic, but with this engine being so flexible it 
> seems that in order to support all of the formatting, casting, etc.. that 
> will likely occur having the group by support an alias would be a big deal. 
> This in my opinion is nothing like an ordinal group by. 
> This works:
> select extract(year from to_date(crimes.datetime, 'MM/DD/ hh:mm:ss a')) 
> from BLAH group by extract(year from to_date(crimes.datetime, 'MM/DD/ 
> hh:mm:ss a'));
> This doesn't:
> select extract(year from to_date(crimes.datetime, 'MM/DD/ hh:mm:ss a')) 
> as mygroup from BLAH group by mygroup



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6544) Timestamp value in Drill UI showed inconsistently with the same value retrieved from sqline

2018-09-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605713#comment-16605713
 ] 

ASF GitHub Bot commented on DRILL-6544:
---

agozhiy commented on a change in pull request #1449: DRILL-6544: Timestamp 
value in Drill UI showed inconsistently with th…
URL: https://github.com/apache/drill/pull/1449#discussion_r215606136
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/VectorUtil.java
 ##
 @@ -227,4 +233,69 @@ private static int getColumnWidth(int[] columnWidths, int 
columnIndex) {
 return (columnWidths == null) ? DEFAULT_COLUMN_WIDTH
 : (columnWidths.length > columnIndex) ? columnWidths[columnIndex] : 
columnWidths[0];
   }
+
+  /**
+   * Formats ValueVector elements in accordance with the corresponding 
system/session options.
+   *
+   * @param value ValueVector element to format, not null
+   * @param minorType the minor type of the element, not null
+   * @param options the OptionManager instance, not null
+   * @return the formatted value, null if failed
+   */
+  public static String formatValueVectorElement(Object value, 
TypeProtos.MinorType minorType, OptionManager options) {
+String formattedValue = null;
+switch (minorType) {
+  case TIMESTAMP:
+if (value instanceof LocalDateTime) {
+  String formatPattern = 
options.getString(ExecConstants.WEB_TIMESTAMP_DISPLAY_FORMAT);
+  if (!formatPattern.isEmpty()) {
 
 Review comment:
   Moved this check to a separate method.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Timestamp value in Drill UI showed inconsistently with the same value 
> retrieved from sqline
> ---
>
> Key: DRILL-6544
> URL: https://issues.apache.org/jira/browse/DRILL-6544
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Minor
>
> *Query:*
> {code:sql}
> select timestamp '2008-2-23 12:23:34' from (values(1));
> {code}
> *Expected result (from sqline):*
> 2008-02-23 12:23:34.0
> *Actual result (from Drill UI):*
> 2008-02-23T12:23:34



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6731) JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter

2018-09-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605670#comment-16605670
 ] 

ASF GitHub Bot commented on DRILL-6731:
---

weijietong commented on issue #1459: DRILL-6731: Move the BFs aggregating work 
from the Foreman to the RuntimeFi…
URL: https://github.com/apache/drill/pull/1459#issuecomment-419061529
 
 
   @sohami @amansinha100  Could you review this PR ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter
> --
>
> Key: DRILL-6731
> URL: https://issues.apache.org/jira/browse/DRILL-6731
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.15.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>
> This PR is to move the BloomFilter aggregating work from the foreman to 
> RuntimeFilter. Though this change, the RuntimeFilter can apply the incoming 
> BF as soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6711) Publish Drill Calcite project artifacts to Apache maven repository

2018-09-06 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-6711.

   Resolution: Won't Do
Fix Version/s: (was: 1.15.0)

Resolving this Jira, since don't see a way how it can be done.

> Publish Drill Calcite project artifacts to Apache maven repository 
> ---
>
> Key: DRILL-6711
> URL: https://issues.apache.org/jira/browse/DRILL-6711
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>
> Publish Drill Calcite project artifacts to Apache maven repository instead of 
> a proprietary one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6711) Publish Drill Calcite project artifacts to Apache maven repository

2018-09-06 Thread Volodymyr Vysotskyi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605656#comment-16605656
 ] 

Volodymyr Vysotskyi commented on DRILL-6711:


All artifacts from [Apache Maven repo|https://repo.maven.apache.org/maven2] are 
published to the [central repository|http://central.maven.org/maven2/], so they 
can be found using [mvnrepository.com|https://mvnrepository.com/].
I tried to find some examples of custom versions in the central repository of 
[Calcite|https://mvnrepository.com/artifact/org.apache.calcite/calcite], 
[Hadoop-common|https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common],
 [Hive|https://mvnrepository.com/artifact/org.apache.hive/hive] and didn't 
found any custom versions published into the Central repository.

But except the central repository in 
[mvnrepository.com|https://mvnrepository.com/] were displayed tabs with 
proprietary ones, which contain the custom versions, but to use them in the 
project, those proprietary repositories should be added to the pom file. For 
example, [Hortonworks 
repo|https://mvnrepository.com/artifact/org.apache.calcite/calcite?repo=hortonworks-releases]
 contains custom versions of Apache Calcite.

So I think the single thing we can do is to change the location of the custom 
Calcite artifacts, so they can be displayed in the 
[mvnrepository.com|https://mvnrepository.com/] in a tab of the vendor 
repository.

> Publish Drill Calcite project artifacts to Apache maven repository 
> ---
>
> Key: DRILL-6711
> URL: https://issues.apache.org/jira/browse/DRILL-6711
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.15.0
>
>
> Publish Drill Calcite project artifacts to Apache maven repository instead of 
> a proprietary one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6711) Publish Drill Calcite project artifacts to Apache maven repository

2018-09-06 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605580#comment-16605580
 ] 

Arina Ielchiieva commented on DRILL-6711:
-

[~vvysotskyi] could you please investigate if it is possible to move Drill 
Calcite to Apache maven repo? Are there any examples how other projects dealt 
with this?

> Publish Drill Calcite project artifacts to Apache maven repository 
> ---
>
> Key: DRILL-6711
> URL: https://issues.apache.org/jira/browse/DRILL-6711
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.15.0
>
>
> Publish Drill Calcite project artifacts to Apache maven repository instead of 
> a proprietary one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6711) Publish Drill Calcite project artifacts to Apache maven repository

2018-09-06 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6711:
---

Assignee: Volodymyr Vysotskyi  (was: Arina Ielchiieva)

> Publish Drill Calcite project artifacts to Apache maven repository 
> ---
>
> Key: DRILL-6711
> URL: https://issues.apache.org/jira/browse/DRILL-6711
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.15.0
>
>
> Publish Drill Calcite project artifacts to Apache maven repository instead of 
> a proprietary one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6731) JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter

2018-09-06 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6731:
--

 Summary: JPPD:Move aggregating the BF from the Foreman to the 
RuntimeFilter
 Key: DRILL-6731
 URL: https://issues.apache.org/jira/browse/DRILL-6731
 Project: Apache Drill
  Issue Type: Improvement
  Components:  Server
Affects Versions: 1.15.0
Reporter: weijie.tong
Assignee: weijie.tong


This PR is to move the BloomFilter aggregating work from the foreman to 
RuntimeFilter. Though this change, the RuntimeFilter can apply the incoming BF 
as soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6727) JPPD does not eliminate rows using the bloom filter if a HashJoin is involved

2018-09-06 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605318#comment-16605318
 ] 

Kunal Khatua edited comment on DRILL-6727 at 9/6/18 6:15 AM:
-

I am not sure if it is possible that the build side's total speed will be 
slower than the probe side, because I think there would be back pressure with 
upto the three batches on the probe side waiting for the build side hash table 
to be complete. But that said, I think the fastest would be to have the runtime 
filter apply the bloom filter for an already flowing data pipe.

The download is fairly big (132GB), so it  can get corrupted if you are 
downloading slowly. I ran this against a 10-node setup.
One option is to download a TPCH dataset of ~10GB (Scale Factor 10) running 
against a handful of nodes with reduced parallelization.
However, since the test involved only 2 tables, you can try with generating the 
data using TPC-H's dbgen utility (generates PSV format data) [link: 
https://github.com/electrum/tpch-dbgen]. 
You can use Drill to convert to parquet, though, your JPPD is not specific to 
parquet.


was (Author: kkhatua):
I am not sure if it is possible that the build side's total speed will be 
slower than the probe side, because I think there would be back pressure with 
upto the three batches on the probe side waiting for the build side hash table 
to be complete. But that said, I think the fastest would be to have the runtime 
filter apply the bloom filter for an already flowing data pipe.

The download is fairly big (132GB), so it  can get corrupted if you are 
downloading slowly. 
One option is to download a TPCH dataset of ~10GB (Scale Factor 10). However, 
since the test involved only 2 tables, you can try with generating the data 
using TPC-H's dbgen utility (generates PSV format data) [link: 
https://github.com/electrum/tpch-dbgen]. 
You can use Drill to convert to parquet, though, your JPPD is not specific to 
parquet.

> JPPD does not eliminate rows using the bloom filter if a HashJoin is involved
> -
>
> Key: DRILL-6727
> URL: https://issues.apache.org/jira/browse/DRILL-6727
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: weijie.tong
>Priority: Critical
> Attachments: 
> bcastJoin-JPPD_2477fb99-36cb-9bc2-b7fb-c81a52b256d2.json, 
> bcastJoin-default_2477fa68-a31e-3b97-5469-373845c2b763.json, 
> hashJoin-JPPD_2477f6f7-14e0-ca23-d9f7-6b0273c20964.json, 
> hashJoin-default_2477f5e8-fff2-fc83-d251-d8be8f92820b.json
>
>
> When testing a simple join between 2 tables, it appears that the Bloom-filter 
> based predicate pushdown will work only for broadcast joins, but not for 
> hash-based joins.
> Since the purpose of the filter is to reduce the number of records being 
> hashed across the fragments, the runtime does not improve.
> Join Query (TPCH dataset):
> {code:sql}
> select
> l.l_orderkey
> , sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
> , o.o_orderdate
> , o.o_shippriority
> from
> orders o
> , lineitem l
> where
> l.l_orderkey = o.o_orderkey
> and o.o_orderdate = date '1994-08-26'
> and MOD(o.o_custkey,10) = 1
> group by
> l.l_orderkey
> , o.o_orderdate
> , o.o_shippriority
> order by
> revenue desc
> , o.o_orderdate limit 10;
> {code}
> This generates an output of about 6K rows from the build side, with the 
> expectation of 10M rows being joined from the probe side.
> Following are the results of the following query:
> || Join Mode || Profile || Runtime || Status ||
> |BCastJoin w/o JPPD |  
> [^bcastJoin-default_2477fa68-a31e-3b97-5469-373845c2b763.json]  | 3.148sec | 
> As expected. 600M rows are scanned and probed against the locally available 
> hash table. |
> |BCastJoin w/ JPPD |  
> [^bcastJoin-JPPD_2477fb99-36cb-9bc2-b7fb-c81a52b256d2.json]  | 3.570sec | 
> 04-xx-06 shows a reduction in rows. 600M rows are scanned, but only 10M rows 
> are probed against the locally available hash table. |
> |
> |HashJoin w/o JPPD |  
> [^hashJoin-default_2477f5e8-fff2-fc83-d251-d8be8f92820b.json]  | 5.861sec | 
> As expected. 600M rows are scanned and probed against the hash table. |
> |HashJoin w/ JPPD |  
> [^hashJoin-JPPD_2477f6f7-14e0-ca23-d9f7-6b0273c20964.json]  | 8.376sec | 
> 03-xx-07 is not seeing a reduction in rows. All 600M rows are scanned and 
> probed against the hash table. |
> There are a few possibilities of why the RuntimeFilter does not eliminate any 
> rows when a HashJoin is involved.
> 1. The RuntimeFilter operator does not have a bloom-filter
> 2. The RuntimeFilter receives the bloom-filter after the scan completes, 
> because the foreman has not finished building and distributing the global 
> 

[jira] [Commented] (DRILL-6727) JPPD does not eliminate rows using the bloom filter if a HashJoin is involved

2018-09-06 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605318#comment-16605318
 ] 

Kunal Khatua commented on DRILL-6727:
-

I am not sure if it is possible that the build side's total speed will be 
slower than the probe side, because I think there would be back pressure with 
upto the three batches on the probe side waiting for the build side hash table 
to be complete. But that said, I think the fastest would be to have the runtime 
filter apply the bloom filter for an already flowing data pipe.

The download is fairly big (132GB), so it  can get corrupted if you are 
downloading slowly. 
One option is to download a TPCH dataset of ~10GB (Scale Factor 10). However, 
since the test involved only 2 tables, you can try with generating the data 
using TPC-H's dbgen utility (generates PSV format data) [link: 
https://github.com/electrum/tpch-dbgen]. 
You can use Drill to convert to parquet, though, your JPPD is not specific to 
parquet.

> JPPD does not eliminate rows using the bloom filter if a HashJoin is involved
> -
>
> Key: DRILL-6727
> URL: https://issues.apache.org/jira/browse/DRILL-6727
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.15.0
>Reporter: Kunal Khatua
>Assignee: weijie.tong
>Priority: Critical
> Attachments: 
> bcastJoin-JPPD_2477fb99-36cb-9bc2-b7fb-c81a52b256d2.json, 
> bcastJoin-default_2477fa68-a31e-3b97-5469-373845c2b763.json, 
> hashJoin-JPPD_2477f6f7-14e0-ca23-d9f7-6b0273c20964.json, 
> hashJoin-default_2477f5e8-fff2-fc83-d251-d8be8f92820b.json
>
>
> When testing a simple join between 2 tables, it appears that the Bloom-filter 
> based predicate pushdown will work only for broadcast joins, but not for 
> hash-based joins.
> Since the purpose of the filter is to reduce the number of records being 
> hashed across the fragments, the runtime does not improve.
> Join Query (TPCH dataset):
> {code:sql}
> select
> l.l_orderkey
> , sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
> , o.o_orderdate
> , o.o_shippriority
> from
> orders o
> , lineitem l
> where
> l.l_orderkey = o.o_orderkey
> and o.o_orderdate = date '1994-08-26'
> and MOD(o.o_custkey,10) = 1
> group by
> l.l_orderkey
> , o.o_orderdate
> , o.o_shippriority
> order by
> revenue desc
> , o.o_orderdate limit 10;
> {code}
> This generates an output of about 6K rows from the build side, with the 
> expectation of 10M rows being joined from the probe side.
> Following are the results of the following query:
> || Join Mode || Profile || Runtime || Status ||
> |BCastJoin w/o JPPD |  
> [^bcastJoin-default_2477fa68-a31e-3b97-5469-373845c2b763.json]  | 3.148sec | 
> As expected. 600M rows are scanned and probed against the locally available 
> hash table. |
> |BCastJoin w/ JPPD |  
> [^bcastJoin-JPPD_2477fb99-36cb-9bc2-b7fb-c81a52b256d2.json]  | 3.570sec | 
> 04-xx-06 shows a reduction in rows. 600M rows are scanned, but only 10M rows 
> are probed against the locally available hash table. |
> |
> |HashJoin w/o JPPD |  
> [^hashJoin-default_2477f5e8-fff2-fc83-d251-d8be8f92820b.json]  | 5.861sec | 
> As expected. 600M rows are scanned and probed against the hash table. |
> |HashJoin w/ JPPD |  
> [^hashJoin-JPPD_2477f6f7-14e0-ca23-d9f7-6b0273c20964.json]  | 8.376sec | 
> 03-xx-07 is not seeing a reduction in rows. All 600M rows are scanned and 
> probed against the hash table. |
> There are a few possibilities of why the RuntimeFilter does not eliminate any 
> rows when a HashJoin is involved.
> 1. The RuntimeFilter operator does not have a bloom-filter
> 2. The RuntimeFilter receives the bloom-filter after the scan completes, 
> because the foreman has not finished building and distributing the global 
> bloom-filter
> 3. The RuntimeFilter receives the bloom-filter during the scan, but does not 
> apply it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)