date:20171017

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/997#discussion_r145319769
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -518,6 +518,22 @@ bool DrillClientImpl::clientNeedsEncryption(const 
DrillUserProperties* userPrope
 return needsEncryption;
 }
 
+/*
+ * Checks if the client has explicitly expressed interest in authenticated 
connections only.
+ * If the USERPROP_PASSWORD or USERPROP_AUTH_MECHANISM connection string 
properties are set,
+ * then it is implied that the client wants authentication.
+ */
+bool DrillClientImpl::clientNeedsAuthentication(const DrillUserProperties* 
userProperties) {
+bool needsAuthentication = false;
+if(!userProperties) {
+return false;
+}
+needsAuthentication = userProperties->isPropSet(USERPROP_PASSWORD) ||
+userProperties->isPropSet(USERPROP_AUTH_MECHANISM);
--- End diff --

I think we should also check if the `password & auth parameter` value is 
not empty string.


---

[GitHub] drill pull request #997: DRILL-5582: C++ Client: [Threat Modeling] Drillbit ...

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/997#discussion_r145319760
  
--- Diff: contrib/native/client/src/clientlib/saslAuthenticatorImpl.cpp ---
@@ -145,6 +145,8 @@ int SaslAuthenticatorImpl::init(const 
std::vector& mechanisms, exec
 authMechanismToUse = value;
 }
 }
+// clientNeedsAuth cannot be false if the code above picks an 
authMechanism
--- End diff --

clientNeedsAuth --> clientNeedsAuthentication


---

[GitHub] drill issue #999: DRILL-5881:Java Client: [Threat Modeling] Drillbit may be ...

Github user sohami commented on the issue:

https://github.com/apache/drill/pull/999
  
@parthchandra - Please help to review this PR. I have added new unit tests 
for the change and made sure all the existing test's are also passing.


---

[GitHub] drill pull request #997: DRILL-5582: C++ Client: [Threat Modeling] Drillbit ...

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/997#discussion_r145317288
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -595,6 +611,12 @@ connectionStatus_t 
DrillClientImpl::validateHandshake(DrillUserProperties* prope
 
 switch(this->m_handshakeStatus) {
 case exec::user::SUCCESS:
+// Check if client needs auth/encryption and server is not 
requiring it
+if(clientNeedsAuthentication(properties) || 
clientNeedsEncryption(properties)) {
--- End diff --

Generally, all error messages come from errmsgs.cpp so we can localize them 
when we need to. 


---

[GitHub] drill pull request #997: DRILL-5582: C++ Client: [Threat Modeling] Drillbit ...

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/997#discussion_r145317403
  
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -595,6 +611,12 @@ connectionStatus_t 
DrillClientImpl::validateHandshake(DrillUserProperties* prope
 
 switch(this->m_handshakeStatus) {
 case exec::user::SUCCESS:
+// Check if client needs auth/encryption and server is not 
requiring it
--- End diff --

Not too clear about the SASL flow, but I assume that if the server 
returning SUCCESS is sufficient to assume that there is no auth required by the 
server.


---

[GitHub] drill pull request #999: DRILL-5881:Java Client: [Threat Modeling] Drillbit ...

GitHub user sohami opened a pull request:

https://github.com/apache/drill/pull/999

DRILL-5881:Java Client: [Threat Modeling] Drillbit may be spoofed by â¦

â¦an attacker and this may lead to data being written to the attacker's 
target instead of Drillbit

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sohami/drill DRILL-5881

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/999.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #999






---

[GitHub] drill pull request #998: DRILL-5887: Display process user/groups info in Dri...

2017-10-17 Thread prasadns14

GitHub user prasadns14 opened a pull request:

https://github.com/apache/drill/pull/998

DRILL-5887: Display process user/groups info in Drill UI

Display process user and process user groups in Drill UI

@paul-rogers please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/prasadns14/drill DRILL-5887

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #998


commit 9f0719423980dfb5d825e4f03a2a450c915ada7c
Author: Prasad Nagaraj Subramanya 
Date:   2017-10-18T00:49:11Z

DRILL-5887: Display process user/groups info in Drill UI




---

[GitHub] drill issue #991: DRILL-5876: Remove netty-tcnative dependency from java-exe...

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/991
  
@vrozov per your other tests, this is still broken for eclipse. So it seems 
that the best bet is to comment out the dependency and the os extension. 
Developers needing to debug, will need to uncomment the dependency. 
I will remove the additional commit.


---

[GitHub] drill pull request #991: DRILL-5876: Remove netty-tcnative dependency from j...

2017-10-17 Thread vrozov

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/991#discussion_r145291048
  
--- Diff: exec/java-exec/pom.xml ---
@@ -22,7 +22,10 @@
 1.8-rev1
 
 
+
--- End diff --

Please uncomment (should be harmful) or move to openssl profile.


---

[GitHub] drill pull request #991: DRILL-5876: Remove netty-tcnative dependency from j...

2017-10-17 Thread vrozov

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/991#discussion_r145289493
  
--- Diff: exec/java-exec/pom.xml ---
@@ -693,6 +699,19 @@
 
   
 
+
+  openssl
+  
+
+  io.netty
+  netty-tcnative
+  2.0.1.Final
--- End diff --

Please add provided scope.


---

[GitHub] drill pull request #991: DRILL-5876: Remove netty-tcnative dependency from j...

Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/991#discussion_r145288182
  
--- Diff: exec/java-exec/pom.xml ---
@@ -701,18 +707,21 @@
   
-  
-kr.motd.maven
-os-maven-plugin
-1.5.0.Final
-  
-
+  
--- End diff --

Updated the PR with the latest recommendations. Using a different profile 
seems to work well.


---

[jira] [Created] (DRILL-5887) Display process user/ groups in Drill UI

2017-10-17 Thread Prasad Nagaraj Subramanya (JIRA)

Prasad Nagaraj Subramanya created DRILL-5887:


 Summary: Display process user/ groups in Drill UI
 Key: DRILL-5887
 URL: https://issues.apache.org/jira/browse/DRILL-5887
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - HTTP
Affects Versions: 1.11.0
Reporter: Prasad Nagaraj Subramanya
Assignee: Prasad Nagaraj Subramanya
Priority: Minor
 Fix For: 1.12.0


Drill UI only lists admin user/ groups specified as options

We should display the process user/ groups who have admin privilege



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] drill pull request #997: DRILL-5582: C++ Client: [Threat Modeling] Drillbit ...

2017-10-17 Thread bitblender

GitHub user bitblender opened a pull request:

https://github.com/apache/drill/pull/997

DRILL-5582: C++ Client: [Threat Modeling] Drillbit may be spoofed by â¦

â¦an attacker and this may lead to data being written to the attacker's 
target instead of Drillbit

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bitblender/drill KM-DRILL-5582

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #997


commit 488ebefd4a2d096c9f02cbcdfd8c6984901b3444
Author: karthik 
Date:   2017-10-17T23:18:45Z

DRILL-5582: C++ Client: [Threat Modeling] Drillbit may be spoofed by an 
attacker and this may lead to data being written to the attacker's target 
instead of Drillbit




---

[jira] [Resolved] (DRILL-5804) External Sort times out, may be infinite loop

2017-10-17 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5804.
---
Resolution: Fixed

> External Sort times out, may be infinite loop
> -
>
> Key: DRILL-5804
> URL: https://issues.apache.org/jira/browse/DRILL-5804
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: drillbit.log
>
>
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> select count(*) from (
>   select * from (
> select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid 
> from (
>   select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested_large` d order by d.uid
> ) s1
>   ) s2
>   order by s2.rms.mapid, s2.rptds.a, s2.rptds.do_not_exist
> );
> {noformat}
> Plan is:
> {noformat}
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
> 00-03  UnionExchange
> 01-01StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 01-02  Project($f0=[0])
> 01-03SingleMergeExchange(sort0=[4 ASC], sort1=[5 ASC], 
> sort2=[6 ASC])
> 02-01  SelectionVectorRemover
> 02-02Sort(sort0=[$4], sort1=[$5], sort2=[$6], dir0=[ASC], 
> dir1=[ASC], dir2=[ASC])
> 02-03  Project(type=[$0], rptds=[$1], rms=[$2], uid=[$3], 
> EXPR$4=[$4], EXPR$5=[$5], EXPR$6=[$6])
> 02-04HashToRandomExchange(dist0=[[$4]], dist1=[[$5]], 
> dist2=[[$6]])
> 03-01  UnorderedMuxExchange
> 04-01Project(type=[$0], rptds=[$1], rms=[$2], 
> uid=[$3], EXPR$4=[$4], EXPR$5=[$5], EXPR$6=[$6], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($6, hash32AsDouble($5, 
> hash32AsDouble($4, 1301011)))])
> 04-02  Project(type=[$0], rptds=[$1], rms=[$2], 
> uid=[$3], EXPR$4=[ITEM($2, 'mapid')], EXPR$5=[ITEM($1, 'a')], 
> EXPR$6=[ITEM($1, 'do_not_exist')])
> 04-03Flatten(flattenField=[$1])
> 04-04  Project(type=[$0], rptds=[ITEM($2, 
> 'rptd')], rms=[$2], uid=[$1])
> 04-05SingleMergeExchange(sort0=[1 ASC])
> 05-01  SelectionVectorRemover
> 05-02Sort(sort0=[$1], dir0=[ASC])
> 05-03  Project(type=[$0], uid=[$1], 
> rms=[$2])
> 05-04
> HashToRandomExchange(dist0=[[$1]])
> 06-01  UnorderedMuxExchange
> 07-01Project(type=[$0], 
> uid=[$1], rms=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)])
> 07-02  
> Flatten(flattenField=[$2])
> 07-03Project(type=[$0], 
> uid=[$1], rms=[ITEM($2, 'rm')])
> 07-04  
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/resource-manager/nested_large]], 
> selectionRoot=maprfs:/drill/testdata/resource-manager/nested_large, 
> numFiles=1, usedMetadataFile=false, columns=[`type`, `uid`, `map`.`rm`]]])
> {noformat}
> Here is a segment of the drillbit.log, starting at line 55890:
> {noformat}
> 2017-09-19 04:22:56,258 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:2] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen44 - Took 142 us to sort 1023 records
> 2017-09-19 04:22:56,265 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:4] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen44 - Took 105 us to sort 1023 records
> 2017-09-19 04:22:56,268 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:3:0] DEBUG 
> o.a.d.e.p.i.p.PartitionSenderRootExec - Partitioner.next(): got next record 
> batch with status OK
> 2017-09-19 04:22:56,275 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:7] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen44 - Took 145 us to sort 1023 records
> 2017-09-19 04:22:56,354 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:3:0] DEBUG 
> o.a.d.e.p.i.p.PartitionSenderRootExec - Partitioner.next(): got next record 
> batch with status OK
> 2017-09-19 04:22:56,357 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:2] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen44 - Took 143 us to sort 1023 records
> 2017-09-19 04:22:56,361 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:0] DEBUG 
> o.a.d.exec.compile.ClassTransformer - Compiled and merged 
> PriorityQueueCopierGen50: bytecode size = 11.0 KiB, time = 124 ms.
> 2017-09-19

[jira] [Created] (DRILL-5886) Operators should create batch sizes that the next operator can consume to avoid OOM

2017-10-17 Thread Robert Hou (JIRA)

Robert Hou created DRILL-5886:
-

 Summary: Operators should create batch sizes that the next 
operator can consume to avoid OOM
 Key: DRILL-5886
 URL: https://issues.apache.org/jira/browse/DRILL-5886
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
 Attachments: 26478262-f0a7-8fc1-1887-4f27071b9c0f.sys.drill, 
drillbit.log.exchange

Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false
alter session set `planner.memory.max_query_memory_per_node` = 482344960
alter session set `planner.width.max_per_node` = 1
alter session set `planner.width.max_per_query` = 1
alter session set `planner.disable_exchanges` = true
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
 
columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520],
 columns[1410], 
columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
 
columns[],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
 columns[3210] ) d where d.col433 = 'sjka skjf';
{noformat}

This is the error from drillbit.log:
2017-09-12 17:36:53,155 [26478262-f0a7-8fc1-1887-4f27071b9c0f:frag:0:0] ERROR 
o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches. 
Incoming batch size: 409305088, available memory: 482344960

Here is the plan:
{noformat}
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-03  Project($f0=[0])
00-04SelectionVectorRemover
00-05  Filter(condition=[=(ITEM($0, 'col433'), 'sjka skjf')])
00-06Project(T8¦¦*=[$0])
00-07  SelectionVectorRemover
00-08Sort(sort0=[$1], sort1=[$2], sort2=[$3], sort3=[$4], 
sort4=[$5], sort5=[$6], sort6=[$7], sort7=[$8], sort8=[$9], sort9=[$10], 
sort10=[$11], sort11=[$12], sort12=[$9], sort13=[$13], sort14=[$14], 
sort15=[$15], sort16=[$16], sort17=[$17], sort18=[$18], sort19=[$19], 
sort20=[$20], sort21=[$21], sort22=[$12], sort23=[$22], sort24=[$23], 
sort25=[$24], sort26=[$25], sort27=[$26], sort28=[$27], sort29=[$28], 
sort30=[$29], sort31=[$30], sort32=[$31], sort33=[$32], sort34=[$33], 
sort35=[$34], sort36=[$35], sort37=[$36], sort38=[$37], sort39=[$38], 
sort40=[$39], sort41=[$40], sort42=[$41], sort43=[$42], sort44=[$43], 
sort45=[$44], sort46=[$45], sort47=[$46], dir0=[ASC], dir1=[ASC], dir2=[ASC], 
dir3=[ASC], dir4=[ASC], dir5=[ASC], dir6=[ASC], dir7=[ASC], dir8=[ASC], 
dir9=[ASC], dir10=[ASC], dir11=[ASC], dir12=[ASC], dir13=[ASC], dir14=[ASC], 
dir15=[ASC], dir16=[ASC], dir17=[ASC], dir18=[ASC], dir19=[ASC], dir20=[ASC], 
dir21=[ASC], dir22=[ASC], dir23=[ASC], dir24=[ASC], dir25=[ASC], dir26=[ASC], 
dir27=[ASC], dir28=[ASC], dir29=[ASC], dir30=[ASC], dir31=[ASC], dir32=[ASC], 
dir33=[ASC], dir34=[ASC], dir35=[ASC], dir36=[ASC], dir37=[ASC], dir38=[ASC], 
dir39=[ASC], dir40=[ASC], dir41=[ASC], dir42=[ASC], dir43=[ASC], dir44=[ASC], 
dir45=[ASC], dir46=[ASC], dir47=[ASC])
00-09  Project(T8¦¦*=[$0], EXPR$1=[ITEM($1, 450)], 
EXPR$2=[ITEM($1, 330)], EXPR$3=[ITEM($1, 230)], EXPR$4=[ITEM($1, 220)], 
EXPR$5=[ITEM($1, 110)], EXPR$6=[ITEM($1, 90)], EXPR$7=[ITEM($1, 80)], 
EXPR$8=[ITEM($1, 70)], EXPR$9=[ITEM($1, 40)], EXPR$10=[ITEM($1, 10)], 
EXPR$11=[ITEM($1, 20)], EXPR$12=[ITEM($1, 30)], EXPR$13=[ITEM($1, 50)], 
EXPR$14=[ITEM($1, 454)], EXPR$15=[ITEM($1, 413)], EXPR$16=[ITEM($1, 940)], 
EXPR$17=[ITEM($1, 834)], EXPR$18=[ITEM($1, 73)], EXPR$19=[ITEM($1, 140)], 
EXPR$20=[ITEM($1, 104)], EXPR$21=[ITEM($1, )], EXPR$22=[ITEM($1, 2420)], 
EXPR$23=[ITEM($1, 1520)], EXPR$24=[ITEM($1, 1410)], EXPR$25=[ITEM($1, 1110)], 
EXPR$26=[ITEM($1, 1290)], EXPR$27=[ITEM($1, 2380)], EXPR$28=[ITEM($1, 705)], 
EXPR$29=[ITEM($1, 45)], EXPR$30=[ITEM($1, 1054)], EXPR$31=[ITEM($1, 2430)], 
EXPR$32=[ITEM($1, 420)], EXPR$33=[ITEM($1, 404)], EXPR$34=[ITEM($1, 3350)], 
EXPR$35=[ITEM($1, )], EXPR$36=[ITEM($1, 153)], EXPR$37=[ITEM($1, 356)], 
EXPR$38=[ITEM($1, 84)], EXPR$39=[ITEM($1, 745)], EXPR$40=[ITEM($1, 1450)], 
EXPR$41=[ITEM($1, 103)], EXPR$42=[ITEM($1, 2065)], EXPR$43=[ITEM($1, 343)], 
EXPR$44=[ITEM($1, 3420)], EXPR$45=[ITEM($1, 530)], EXPR$46=[ITEM($1, 3210)])
00-10Project(T8¦¦*=[$0], columns=[$1])
00-11  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/3500cols.tbl, 
numFiles=1, columns=[`*`],

[jira] [Created] (DRILL-5885) Drill consumes 2x memory when sorting and reading a spilled batch from disk.

2017-10-17 Thread Robert Hou (JIRA)

Robert Hou created DRILL-5885:
-

 Summary: Drill consumes 2x memory when sorting and reading a 
spilled batch from disk.
 Key: DRILL-5885
 URL: https://issues.apache.org/jira/browse/DRILL-5885
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou


The query is:
{noformat}
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
 
columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520],
 columns[1410], 
columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
 
columns[],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
 columns[3210] ) d where d.col433 = 'sjka skjf';
{noformat}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] drill pull request #996: DRILL-5878: TableNotFound exception is being report...

2017-10-17 Thread HanumathRao

Github user HanumathRao commented on a diff in the pull request:

https://github.com/apache/drill/pull/996#discussion_r145268123
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
 ---
@@ -481,6 +485,19 @@ public RelOptTableImpl getTable(final List 
names) {
 .message("Temporary tables usage is disallowed. Used temporary 
table name: %s.", names)
 .build(logger);
   }
+
+  // Check the schema and throw a valid SchemaNotFound exception 
instead of TableNotFound exception.
--- End diff --

Thank you for the review. I agree that this should ideally be handled at 
Calcite layer. I also think that even after Calcite providing this 
functionality there should be some customization that needs to be done as we 
understand the context better than Calcite. Once the calcite fixes this issue 
then we can always change the code accordingly. 


---

Drill Questions (Developer)

2017-10-17 Thread Max Orelus

Hi, 

I'm not exactly sure which mail listing I'm suppose to ask these
developer questions on, so sorry if there is any inconvenience. 

I'm developing a web application that relies on Drill as its main
search/querying functionality. I've gone through the documentation, but
there's a couple things that are still unclear to me when using Drill.
If anyone a part of the core/developer team could address any of these
questions I would appreciate it.

1. From a terminal session I'm able to start Drill and start to execute
queries on the CLI. One tasks that I can do from the terminal is CREATE
A TEMPORARY TABLE name AS query; execute that and right after the
execution I'm able to query the tmp table as long as I keep the terminal
session open . 

I would like to be able to do this from a REST client, I was wondering
if there was anyway to chain SQL queries when making a request to POST
http://localhost:8047/query.json? When I submit a query via the
web-console or the REST API, the temporary table gets created, but when
I want to issue another request to the tmp_table I just created, I'm not
able to because the table at the point has already been dropped. Is
there a way to chain two queries using the REST API to execute one after
another and on return the last queries results?

2. I have streams of data being written to separate folders (folderA,
folderB, folderC) in parquet format. Each stream has common columns that
are shared across all streams, but they also have unique columns that
only apply to the particular stream. I know I'm able to query all
streams by just issuing a wildcard for the pattern of the directories
and the results will return with an extra column titled dir0, with the
reference to the directory the record came from. 

I'm wondering if there's a way to sort amongst the results that are
returned, because as of my trial and errors, I have not been able to
sort when querying across different stream schemas, only when I query
one schema at a time I'm able to sort the results. 

Is there a way to construct my query that could potentially assist with
this request?

3. Do you have examples of constructing a histogram like query against
sample data by date?

Thank you for your time.

Best regards,

--  
Max Orelus
+1 (202) 361-9946
maxore...@fastmail.com

[GitHub] drill pull request #996: DRILL-5878: TableNotFound exception is being report...

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/996#discussion_r145231215
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
 ---
@@ -481,6 +485,19 @@ public RelOptTableImpl getTable(final List 
names) {
 .message("Temporary tables usage is disallowed. Used temporary 
table name: %s.", names)
 .build(logger);
   }
+
+  // Check the schema and throw a valid SchemaNotFound exception 
instead of TableNotFound exception.
--- End diff --

Does it mean that Calcite instead of returning schema not found exception 
returns table not found exception?
Per my understanding this PR customizes Drill but what if we go different 
path and enhance Calcite (or may be this is already done in newer Calcite 
versions)?


---

RE: log flooded by "date values definitively CORRECT"

2017-10-17 Thread Kunal Khatua

Ouch! 

Looks like a logger was left behind in DEBUG mode. Can you manually turn that 
off?

More memory would help in this case, because it seems that the foreman node is 
the one running out of heap space as it goes through the metadata for all the 
files. Is there a reason you are generating so many files to query? There is 
most likely a lower threshold for a parquet file size, below which you might be 
better off just using something like a CSV format.



-Original Message-
From: François Méthot [mailto:fmetho...@gmail.com] 
Sent: Tuesday, October 17, 2017 10:35 AM
To: dev@drill.apache.org
Subject: log flooded by "date values definitively CORRECT"

Hi again,

  I am running into an issue on a query done on 760 000 parquet files stored in 
HDFS. We are using Drill 1.10, 8GB heap, 20GB direct mem. Drill runs with debug 
log enabled all the time.

The query is standard select on  8 fields from hdfs.`/path` where this = that 



For about an hour I see this message on the foreman:

[pool-9-thread-##] DEBUG o.a.d.exec.store.parquet.Metadata - It is determined 
from metadata that the date values are definitely CORRECT

Then

[some UUID:foreman] INFO o.a.d.exec.store.parquet.Metadata - Fetch parquet 
metadata : Executed 761659 out of 761659 using 16 threads. Time : 3022416ms

Then :
Java.lang.OutOfMemoryError: Java Heap Space
   at java.util.Arrays.copyOf
   ...
   at java.io.PrintWriter.println(PrintWriter.java:757)
   at org.apache.calcite.rel.externalize.RelWriterImplt.explain
(RelWriterImpl.java:118)
   at org.apachje.calcite.rel.externalize.RelWriterImpl.done
(RelWriterImpl.java:160)
...
   at org.apache.calcite.plan.RelOptUtil.toString (RelOptUtil.java:1927)
   at
org.apache.drill.exec.planner.sql.handlers.DefaultSQLHandler.log(DefaultSQLHandler.java:138)
   ...
   at
org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler:102)
   at
org.apache.drill.exec.planner.DrillSqlWorker.getQueryPlan(DrillSqlWorker:131)
   ...
   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1050)
   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281)



I think it might be caused by having too much files to query, chunking our 
select into smaller piece actually helped.
Also suspect that the DEBUG logging is taxing the poor node a bit much.

Do you think adding more memory would address the issue (I can't try this right 
now) or you would think it is caused by a bug?


Thank in advance for any advises,

Francois

[GitHub] drill issue #970: DRILL-5832: Migrate OperatorFixture to use SystemOptionMan...

2017-10-17 Thread paul-rogers

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/970
  
There is a funny thing about the way Drill works. I can review your changes 
and commit them as soon as I provide a +1. My changes must wait until another 
committer to find time in their very busy schedules to consider this work. So, 
we'll likely commit yours first, I'll rebase mine on top of it, then wait for 
another committer to find time to consider it. The one exception would be if a 
non-committer can give this PR a +1 and a committer agrees to do a bulk commit 
this week.


---

log flooded by "date values definitively CORRECT"

2017-10-17 Thread François Méthot

Hi again,

I am running into an issue on a query done on 760 000 parquet files
stored in HDFS. We are using Drill 1.10, 8GB heap, 20GB direct mem. Drill
runs with debug log enabled all the time.

The query is standard select on 8 fields from hdfs.`/path` where this =
that

For about an hour I see this message on the foreman:

[pool-9-thread-##] DEBUG o.a.d.exec.store.parquet.Metadata - It is
determined from metadata that the date values are definitely CORRECT

Then

[some UUID:foreman] INFO o.a.d.exec.store.parquet.Metadata - Fetch parquet
metadata : Executed 761659 out of 761659 using 16 threads. Time : 3022416ms

Then :
Java.lang.OutOfMemoryError: Java Heap Space
at java.util.Arrays.copyOf
...
at java.io.PrintWriter.println(PrintWriter.java:757)
at org.apache.calcite.rel.externalize.RelWriterImplt.explain
(RelWriterImpl.java:118)
at org.apachje.calcite.rel.externalize.RelWriterImpl.done
(RelWriterImpl.java:160)
...
at org.apache.calcite.plan.RelOptUtil.toString (RelOptUtil.java:1927)
at
org.apache.drill.exec.planner.sql.handlers.DefaultSQLHandler.log(DefaultSQLHandler.java:138)
...
at
org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler:102)
at
org.apache.drill.exec.planner.DrillSqlWorker.getQueryPlan(DrillSqlWorker:131)
...
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1050)
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281)

I think it might be caused by having too much files to query, chunking our
select into smaller piece actually helped.
Also suspect that the DEBUG logging is taxing the poor node a bit much.

Do you think adding more memory would address the issue (I can't try this
right now) or you would think it is caused by a bug?

Thank in advance for any advises,

Francois

[GitHub] drill issue #970: DRILL-5832: Migrate OperatorFixture to use SystemOptionMan...

2017-10-17 Thread ilooner

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/970
  
@paul-rogers Some of the changes I am making on top of 
https://github.com/apache/drill/pull/978/ as part of DRILL-5730 will likely 
conflict with this change. When do you think this could make it in? It would be 
helpful to have it merged sooner to avoid more conflicts down the line :) .


---

[GitHub] drill issue #936: DRILL-5772: Add unit tests to indicate how utf-8 support c...

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/936
  
@paul-rogers 
agree with you that charsets used in saffron properties should be defaulted 
in Drill to `UTF-8` since Drill can read UTF-8 data and it's strange that it 
would fail by default when Calcite will attempt to parse string into literal 
used in query.

I have looked into Calcite code and there is no option to hard-code charset 
values for Calcite but charset can be changed using properties.
There are two options of setting saffron properties:
1. as system property;
2. using `saffron.properties` file.

I don't really like passing them as `-D` when starting the drillbit 9since 
there are at least two), so I am more inclined to use `saffron.properties` 
file. Unfortunately, in Calcite code `saffron.properties` location is expected 
to be working folder [1], i.e. the place where java process was started. I have 
created Jira and pull request in Calcite to allow `saffron.properties` to be 
present in classpath since it's more convenient [2]. I'll keep you updated on 
Calcite community feedback.

[1] 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/util/SaffronProperties.java#L113

[2] https://issues.apache.org/jira/browse/CALCITE-2014


---

[GitHub] drill pull request #971: Drill-5834 Add Networking Functions

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/971#discussion_r145080845
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/NetworkFunctions.java
 ---
@@ -0,0 +1,619 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.BitHolder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+public class NetworkFunctions {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(NetworkFunctions.class);
+
+  private NetworkFunctions() {}
+
+  /**
+   * This function takes two arguments, an input IPv4 and a CIDR, and 
returns true if the IP is in the given CIDR block
+   *
+   */
+  @FunctionTemplate(
+name = "in_network",
+scope = FunctionTemplate.FunctionScope.SIMPLE,
+nulls = FunctionTemplate.NullHandling.NULL_IF_NULL
+  )
+  public static class InNetworkFunction implements DrillSimpleFunc {
+
+@Param
+VarCharHolder inputIP;
+
+@Param
+VarCharHolder inputCIDR;
+
+@Output
+BitHolder out;
+
+@Inject
+DrillBuf buffer;
+
+public void setup() {
+}
+
+
+public void eval() {
+
+  String ipString = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputIP.start,
 inputIP.end, inputIP.buffer);
+  String cidrString = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start,
 inputCIDR.end, inputCIDR.buffer);
+
+  int result = 0;
+  org.apache.commons.net.util.SubnetUtils utils = new 
org.apache.commons.net.util.SubnetUtils(cidrString);
+
+  if(utils.getInfo().isInRange(ipString) ){
+result = 1;
+  }
+
+  out.value = result;
+}
+  }
+
+
+  /**
+   * This function retunrs the number of IP addresses in the input CIDR 
block.
+   */
+  @FunctionTemplate(
+name = "address_count",
+scope = FunctionTemplate.FunctionScope.SIMPLE,
+nulls = FunctionTemplate.NullHandling.NULL_IF_NULL
+  )
+  public static class AddressCountFunction implements DrillSimpleFunc {
+
+@Param
+VarCharHolder inputCIDR;
+
+@Output
+BigIntHolder out;
+
+@Inject
+DrillBuf buffer;
+
+public void setup() {
+}
+
+public void eval() {
+
+  String cidrString = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start,
 inputCIDR.end, inputCIDR.buffer);
+  org.apache.commons.net.util.SubnetUtils utils = new 
org.apache.commons.net.util.SubnetUtils(cidrString);
+
+  out.value = utils.getInfo().getAddressCount();
+
+}
+
+  }
+
+  /**
+   * This function returns the broadcast address of a given CIDR block.
+   */
+  @FunctionTemplate(
+name = "broadcast_address",
+scope = FunctionTemplate.FunctionScope.SIMPLE,
+nulls = FunctionTemplate.NullHandling.NULL_IF_NULL
+  )
+  public static class BroadcastAddressFunction implements DrillSimpleFunc {
+
+@Param
+VarCharHolder inputCIDR;
+
+@Output
+VarCharHolder out;
+
+@Inject
+DrillBuf buffer;
+
+public void setup() {
+}
+
+public void eval() {
+
+  String cidrString = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start,
 inputCIDR.end,

[GitHub] drill pull request #971: Drill-5834 Add Networking Functions

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/971#discussion_r145078505
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/NetworkFunctions.java
 ---
@@ -0,0 +1,668 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.expr.fn.impl;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.commons.net.util.SubnetUtils;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.BitHolder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+public class NetworkFunctions{
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(NetworkFunctions.class);
+
+  private NetworkFunctions() {}
+
+  /**
+   * This function takes two arguments, an input IPv4 and a CIDR, and 
returns true if the IP is in the given CIDR block
+   *
+   */
+  @FunctionTemplate(
+name = "in_network",
+scope = FunctionTemplate.FunctionScope.SIMPLE,
+nulls = FunctionTemplate.NullHandling.NULL_IF_NULL
+  )
+  public static class InNetworkFunction implements DrillSimpleFunc {
+
+@Param
+VarCharHolder inputIP;
+
+@Param
+VarCharHolder inputCIDR;
+
+@Output
+BitHolder out;
+
+@Inject
+DrillBuf buffer;
+
+@Workspace
+SubnetUtils utils;
+
+public void setup() {
+}
+
+
+public void eval() {
+
+  String ipString = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputIP.start,
 inputIP.end, inputIP.buffer);
+  String cidrString = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start,
 inputCIDR.end, inputCIDR.buffer);
+
+  int result = 0;
+  utils = new org.apache.commons.net.util.SubnetUtils(cidrString);
+
+  if( utils.getInfo().isInRange( ipString ) ){
+result = 1;
+  }
+  else{
+result = 0;
+  }
+  out.value = result;
+}
+  }
+
+
+  /**
+   * This function retunrs the number of IP addresses in the input CIDR 
block.
+   */
+  @FunctionTemplate(
+name = "getAddressCount",
+scope = FunctionTemplate.FunctionScope.SIMPLE,
+nulls = FunctionTemplate.NullHandling.NULL_IF_NULL
+  )
+  public static class getAddressCountFunction implements DrillSimpleFunc {
+
+@Param
+VarCharHolder inputCIDR;
+
+@Output
+BigIntHolder out;
+
+@Inject
+DrillBuf buffer;
+
+@Workspace
+SubnetUtils utils;
+
+public void setup() {
+}
+
+public void eval() {
+
+  String cidrString = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start,
 inputCIDR.end, inputCIDR.buffer);
+  utils = new org.apache.commons.net.util.SubnetUtils(cidrString);
+
+  out.value = utils.getInfo().getAddressCount();
+
+}
+
+  }
+
+  /**
+   * This function returns the broadcast address of a given CIDR block.
+   */
+  @FunctionTemplate(
+name = "getBroadcastAddress",
+scope = FunctionTemplate.FunctionScope.SIMPLE,
+nulls = FunctionTemplate.NullHandling.NULL_IF_NULL
+  )
+  public static class getBroadcastAddressFunction implements 
DrillSimpleFunc {
+
+@Param
+VarCharHolder inputCIDR;
+
+@Output
+VarCharHolder out;
+
+@Inject
+DrillBuf buffer;
+
+@Workspace

[GitHub] drill pull request #971: Drill-5834 Add Networking Functions