[GitHub] drill issue #805: Drill-4139: Exception while trying to prune partition. jav...

2017-09-20 Thread sachouche
Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/805
  
+1


---


[GitHub] drill pull request #951: DRILL-5727: Update release profile to generate SHA-...

2017-09-20 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/951#discussion_r140129986
  
--- Diff: pom.xml ---
@@ -977,6 +977,7 @@
   
 MD5
 SHA-1
+SHA-512
--- End diff --

Maybe we can remove SHA-1 usage?


---


[GitHub] drill pull request #950: DRILL-5431: SSL Support

2017-09-20 Thread superbstreak
Github user superbstreak commented on a diff in the pull request:

https://github.com/apache/drill/pull/950#discussion_r140129825
  
--- Diff: contrib/native/client/src/include/drill/common.hpp ---
@@ -163,9 +170,13 @@ typedef enum{
 #define USERPROP_USERNAME "userName"
 #define USERPROP_PASSWORD "password"
 #define USERPROP_SCHEMA   "schema"
-#define USERPROP_USESSL   "useSSL"// Not implemented yet
-#define USERPROP_FILEPATH "pemLocation"   // Not implemented yet
-#define USERPROP_FILENAME "pemFile"   // Not implemented yet
+#define USERPROP_USESSL   "enableTLS"
+#define USERPROP_TLSPROTOCOL "TLSProtocol" //TLS version
+#define USERPROP_CERTFILEPATH "certFilePath" // pem file path and name
+#define USERPROP_CERTPASSWORD "certPassword" // Password for certificate 
file
--- End diff --

I think we can remove this to avoid confusion :)


---


RE: Drill 2.0 (design) hackathon

2017-09-20 Thread Kunal Khatua
I think that's a good idea. 
We could put this up in a list (in the google doc) of items to discuss on the 
hangout. That way, if we have no pressing topics to discuss, we can certainly 
pick something from the list .

-Original Message-
From: Aman Sinha [mailto:amansi...@apache.org] 
Sent: Wednesday, September 20, 2017 8:13 AM
To: dev@drill.apache.org
Subject: Re: Drill 2.0 (design) hackathon

Thanks to all the folks who attended the hackathon - both local and remote.
  For the remote attendees, you missed out on a good dinner :)

We had a day of excellent discussion on several topics:  Resource management, 
operator level performance improvements, TPC-DS coverage, metadata management, 
concurrency, usability and error handling, storage
plugins + rest APIs.   It will take a couple of days to compile all the
notes and we will post them.

Since the focus was more in-depth discussion rather than breadth, and 1 day is 
clearly not adequate, some topics were left out.  We can continue those 
discussions on the dev list / hangout  or if it can wait, possibly do it in a 
future hackathon.

-Aman

On Fri, Sep 15, 2017 at 2:54 PM, Charles Givre  wrote:

> Hi Pritesh,
> What time do you think you’d want me to present?  Also, should I make 
> some slides?
> Best,
> — C
>
> > On Sep 15, 2017, at 13:23, Pritesh Maker  wrote:
> >
> > Hi All
> >
> > We are looking forward to hosting the hackathon on Monday. Just a 
> > few
> updates on the logistics and agenda
> >
> > • We are expecting over 25 people attending the event – you can see 
> > the
> attendee list at the Eventbrite site -  https://www.eventbrite.com/e/
> drill-developer-day-sept-2017-registration-7478463285
> >
> > • Breakfast will be served starting at 8:30AM – we would like to 
> > begin
> promptly at 9AM
> >
> > • The agenda has been updated to reflect the speakers (see the 
> > update in
> the sheet - https://docs.google.com/spreadsheets/d/
> 1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 )
> > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman 
> > Sinha o Community Contributions – Anil Kumar, John Omernik, Charles 
> > Givre and
> Ted Dunning
> > o Two tracks for technical design discussions – some topics have 
> > initial
> thoughts for the topics and some will have open brainstorming 
> discussions
> > o Once the discussions are concluded, we will have summaries 
> > presented
> and notes shared with the community
> >
> > • We will have a WebEx for the first two sessions. For the two 
> > tracks,
> we will either continue the WebEx or have Hangout links (will publish 
> them to the google sheet)
> > "JOIN WEBEX MEETING
> > https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c
> > 6c76 Meeting number (access code): 806 111 950 Meeting password: 
> > ApacheDrill"
> >
> > • For the attendees in person, we have made bookings for a dinner in 
> > the
> evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas
> >
> > Looking forward to a fantastic day for the Apache Drill! community!
> >
> > Thanks,
> > Pritesh
> >
> >
> >
> > On 9/5/17, 10:47 PM, "Aman Sinha"  wrote:
> >
> >Here is the Eventbrite event for registration:
> >
> >https://www.eventbrite.com/e/drill-developer-day-sept-2017-
> registration-7478463285
> >
> >Please register so we can plan for food and drinks appropriately.
> >
> >The link also contains a google doc link for the preliminary 
> > agenda
> and a
> >'Topics' tab with volunteer sign-up column.  Please add your name 
> > to
> the
> >area(s) of interest.
> >
> >Thanks and look forward to seeing you all !
> >
> >-Aman
> >
> >On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers 
> wrote:
> >
> >> A partial list of Drill’s public APIs:
> >>
> >> IMHO, highest priority for Drill 2.0.
> >>
> >>
> >>  *   JDBC/ODBC drivers
> >>  *   Client (for JDBC/ODBC) + ODBC & JDBC
> >>  *   Client (for full Drill async, columnar)
> >>  *   Storage plugin
> >>  *   Format plugin
> >>  *   System/session options
> >>  *   Queueing (e.g. ZK-based queues)
> >>  *   Rest API
> >>  *   Resource Planning (e.g. max query memory per node)
> >>  *   Metadata access, storage (e.g. file system locations vs. a
> metastore)
> >>  *   Metadata files formats (Parquet, views, etc.)
> >>
> >> Lower priority for future releases:
> >>
> >>
> >>  *   Query Planning (e.g. Calcite rules)
> >>  *   Config options
> >>  *   SQL syntax, especially Drill extensions
> >>  *   UDF
> >>  *   Management (e.g. JMX, Rest API calls, etc.)
> >>  *   Drill File System (HDFS)
> >>  *   Web UI
> >>  *   Shell scripts
> >>
> >> There are certainly more. Please suggest those that are missing. 
> >> I’ve taken a rough cut at which APIs need forward/backward 
> >> compatibility
> first,
> >> in part based on those that are the “most public” and most likely 
> >> to change. Others are important, but we can’t do them all at once.
> >>
> >> 

Added "spinner" code to allow debugging of failure cause

2017-09-20 Thread Boaz Ben-Zvi
  FYI and for feedback:

  As part of Pull Request #938 I added a “spinner” code in the build() method 
of the UserException class, such that when this method is called (i.e., before 
reporting of a failure to the user), that code can go into a looping spin 
(instead of continuing to termination).

This can be useful when investigating the original failure, allowing to attach 
a debugger, or use jstack to see the stacks at this point of execution, or 
check some external things (like condition of the spill files at that point), 
etc.

To trigger this feature ON, need to create (an empty) flag file named 
/tmp/drill/spin at every node where this stop-spinning needs to take place 
(e.g., use “clush –a touch /tmp/drill/spin” to set it all across the cluster).  
Once a thread hits this code, it checks for the existence of this spin file, 
and if exists, the thread creates a temp file named something like: 
/tmp/drill/spin4148663301172491613.tmp  which contains its process ID (e.g., to 
allow jstack) and the error message, like:

~ 5 > cat /tmp/drill/spin5273075865809469794.tmp
Spinning process: 16966@BBenZvi-E754-MBP13.local
Error cause: SYSTEM ERROR: CannotPlanException: Node 
[rel#232:Subset#10.PHYSICAL.SINGLETON([]).[]] could not be implemented; planner 
state:

Root: rel#232:Subset#10.PHYSICAL.SINGLETON([]).[]
. . . . . . .

~ 6 > jstack 16966
Picked up JAVA_TOOL_OPTIONS: -ea
2017-09-20 17:15:21
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode):

"Attach Listener" #91 daemon prio=9 os_prio=31 tid=0x7fdd8830b000 
nid=0x4f07 waiting on condition [0x]
   java.lang.Thread.State: RUNNABLE

"263cfbd5-329d-b9fb-d96e-392e4fe0be4d:foreman" #53 daemon prio=10 os_prio=31 
tid=0x7fdd8823a000 nid=0x7203 waiting on condition [0x72224000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:570)
. . . . . . . .

The spinning thread then loops – sleeps for a second and then rechecks that 
flag file. To turn this feature OFF and release the spinning threads one need 
to delete that empty spin files (e.g., use “clush –a rm /tmp/drill/spin”). This 
will also clean the relevant temp files.

   Hope this is useful, and welcome any feedback or suggestions.

  Boaz



[GitHub] drill pull request #951: DRILL-5727: Update release profile to generate SHA-...

2017-09-20 Thread parthchandra
GitHub user parthchandra opened a pull request:

https://github.com/apache/drill/pull/951

DRILL-5727: Update release profile to generate SHA-512 checksum.

New Apache release guidelines require a sha-512 checksum 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/parthchandra/drill DRILL-5727

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/951.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #951


commit ed1d5508dbe70a7b58bbf36628325462644ed19e
Author: Parth Chandra 
Date:   2017-09-20T20:42:54Z

DRILL-5727: Update release profile to generate SHA-512 checksum.




---


[jira] [Resolved] (DRILL-5715) Performance of refactored HashAgg operator regressed

2017-09-20 Thread Boaz Ben-Zvi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi resolved DRILL-5715.
-
Resolution: Fixed
  Reviewer: Paul Rogers

 The commit for DRILL-5694 (PR #938) also solves this performance bug 
(basically removed calls to Setup before every hash computation, plus few 
little changes like replacing setSafe with set ).

> Performance of refactored HashAgg operator regressed
> 
>
> Key: DRILL-5715
> URL: https://issues.apache.org/jira/browse/DRILL-5715
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.11.0
> Environment: 10-node RHEL 6.4 (32 Core, 256GB RAM)
>Reporter: Kunal Khatua
>Assignee: Boaz Ben-Zvi
>  Labels: performance, regression
> Fix For: 1.12.0
>
> Attachments: 26736242-d084-6604-aac9-927e729da755.sys.drill, 
> 26736615-9e86-dac9-ad77-b022fd791f67.sys.drill, 
> 2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill, 
> 2675de42-3789-47b8-29e8-c5077af136db.sys.drill, drill-1.10.0_callTree.png, 
> drill-1.10.0_hotspot.png, drill-1.11.0_callTree.png, drill-1.11.0_hotspot.png
>
>
> When running the following simple HashAgg-based query on a TPCH-table - 
> Lineitem with 6Billion rows on a 10 node setup (with a single partition to 
> disable any possible spilling to disk)
> {code:sql}
> select count(*) 
> from (
>   select l_quantity
> , count(l_orderkey) 
>   from lineitem 
>   group by l_quantity 
> )  {code}
> the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the 
> JDBC client].
> To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was 
> modified to 
> {code}drill.exec.hashagg.num_partitions : 1{code}
> Attached are two profiles
> Drill 1.10.0 : [^2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill] 
> Drill 1.11.0 : [^2675de42-3789-47b8-29e8-c5077af136db.sys.drill]
> A separate run was done for both scenarios with the 
> {{planner.width.max_per_node=10}} and profiled with YourKit.
> Image snippets are attached, indicating the hotspots in both builds:
> *Drill 1.10.0* : 
>  Profile: [^26736242-d084-6604-aac9-927e729da755.sys.drill]
>  CallTree: [^drill-1.10.0_callTree.png]
>  HotSpot: [^drill-1.10.0_hotspot.png]
> !drill-1.10.0_hotspot.png|drill-1.10.0_hotspot!
> *Drill 1.11.0* : 
>  Profile: [^26736615-9e86-dac9-ad77-b022fd791f67.sys.drill]
>  CallTree: [^drill-1.11.0_callTree.png]
>  HotSpot: [^drill-1.11.0_hotspot.png] 
> !drill-1.11.0_hotspot.png|drill-1.11.0_hotspot!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (DRILL-5740) hash agg fail to read spill file

2017-09-20 Thread Boaz Ben-Zvi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi resolved DRILL-5740.
-
   Resolution: Fixed
Fix Version/s: 1.12.0

The commit for DRILL-5694 (PR #938) also solves this bug (basically removed an 
unneeded closing of the SpillSet).


> hash agg fail to read spill file
> 
>
> Key: DRILL-5740
> URL: https://issues.apache.org/jira/browse/DRILL-5740
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.12.0
>Reporter: Chun Chang
>Assignee: Boaz Ben-Zvi
>Priority: Blocker
> Fix For: 1.12.0
>
>
> -Build: | 1.12.0-SNAPSHOT  | 11008d029bafa36279e3045c4ed1a64366080620
> -Multi-node drill cluster
> Running a query causing hash agg spill fails with the following error. And 
> this seems to be a regression.
> {noformat}
> Execution Failures:
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg5.q
> Query:
> select gby_date, gby_int32_rand, sum(int32_field), avg(float_field), 
> min(boolean_field), count(double_rand) from 
> dfs.`/drill/testdata/hagg/PARQUET-500M.parquet` group by gby_date, 
> gby_int32_rand order by gby_date, gby_int32_rand limit 30
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> Fragment 1:34
> [Error Id: 291a79f8-9b7a-485d-9404-e7b7fe1d8f1e on 10.10.30.168:31010]
>   (java.lang.RuntimeException) java.io.FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> 
> org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.outputCurrentBatch():980
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-20 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r140098512
  
--- Diff: 
common/src/main/java/org/apache/drill/common/exceptions/UserException.java ---
@@ -536,6 +542,33 @@ public Builder pushContext(final String name, final 
double value) {
  * @return user exception
  */
 public UserException build(final Logger logger) {
+
+  // To allow for debugging:
+  // A spinner code to make the execution stop here while the file 
'/tmp/drillspin' exists
--- End diff --

Done


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-20 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r140098546
  
--- Diff: 
common/src/main/java/org/apache/drill/common/exceptions/UserException.java ---
@@ -536,6 +542,33 @@ public Builder pushContext(final String name, final 
double value) {
  * @return user exception
  */
 public UserException build(final Logger logger) {
+
+  // To allow for debugging:
+  // A spinner code to make the execution stop here while the file 
'/tmp/drillspin' exists
+  // Can be used to attach a debugger, use jstack, etc
+  // The processID of the spinning thread should be in a file like 
/tmp/spin4148663301172491613.tmp
--- End diff --

Done 


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-20 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r140093627
  
--- Diff: 
common/src/main/java/org/apache/drill/common/exceptions/UserException.java ---
@@ -536,6 +542,33 @@ public Builder pushContext(final String name, final 
double value) {
  * @return user exception
  */
 public UserException build(final Logger logger) {
+
+  // To allow for debugging:
+  // A spinner code to make the execution stop here while the file 
'/tmp/drillspin' exists
+  // Can be used to attach a debugger, use jstack, etc
+  // The processID of the spinning thread should be in a file like 
/tmp/spin4148663301172491613.tmp
+  // along with the error message.
+  File spinFile = new File("/tmp/drillspin");
+  if ( spinFile.exists() ) {
+File tmpDir = new File("/tmp");
+File outErr = null;
+try {
+  outErr = File.createTempFile("spin", ".tmp", tmpDir);
+  BufferedWriter bw = new BufferedWriter(new FileWriter(outErr));
+  bw.write("Spinning process: " + 
ManagementFactory.getRuntimeMXBean().getName()
+  /* After upgrading to JDK 9 - replace with: 
ProcessHandle.current().getPid() */);
+  bw.write("\nError cause: " +
+(errorType == DrillPBError.ErrorType.SYSTEM ? ("SYSTEM ERROR: 
" + ErrorHelper.getRootMessage(cause)) : message));
+  bw.close();
+} catch (Exception ex) {
+  logger.warn("Failed creating a spinner tmp message file: {}", 
ex);
+}
+while (spinFile.exists()) {
+  try { sleep(1_000); } catch (Exception ex) { /* ignore 
interruptions */ }
--- End diff --

Yes - if some non-blocked part tries to kill the query, the spinning parts 
would still be blocked - that may be by design, as debugging still goes on 
(until a user issues "clush -a rm /tmp/drill/spin" )



---


[GitHub] drill issue #949: DRILL-5795: Parquet Filter push down at rowgroup level

2017-09-20 Thread dprofeta
Github user dprofeta commented on the issue:

https://github.com/apache/drill/pull/949
  
I will add a unit test to test the number of rowgroups that are scanned by 
the groupscan to see if the filter is well able to prune rowgroup.


---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-20 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r140062933
  
--- Diff: 
common/src/main/java/org/apache/drill/common/exceptions/UserException.java ---
@@ -536,6 +542,33 @@ public Builder pushContext(final String name, final 
double value) {
  * @return user exception
  */
 public UserException build(final Logger logger) {
+
+  // To allow for debugging:
+  // A spinner code to make the execution stop here while the file 
'/tmp/drillspin' exists
+  // Can be used to attach a debugger, use jstack, etc
+  // The processID of the spinning thread should be in a file like 
/tmp/spin4148663301172491613.tmp
+  // along with the error message.
+  File spinFile = new File("/tmp/drillspin");
+  if ( spinFile.exists() ) {
+File tmpDir = new File("/tmp");
+File outErr = null;
+try {
+  outErr = File.createTempFile("spin", ".tmp", tmpDir);
+  BufferedWriter bw = new BufferedWriter(new FileWriter(outErr));
+  bw.write("Spinning process: " + 
ManagementFactory.getRuntimeMXBean().getName()
+  /* After upgrading to JDK 9 - replace with: 
ProcessHandle.current().getPid() */);
+  bw.write("\nError cause: " +
+(errorType == DrillPBError.ErrorType.SYSTEM ? ("SYSTEM ERROR: 
" + ErrorHelper.getRootMessage(cause)) : message));
+  bw.close();
+} catch (Exception ex) {
+  logger.warn("Failed creating a spinner tmp message file: {}", 
ex);
+}
+while (spinFile.exists()) {
+  try { sleep(1_000); } catch (Exception ex) { /* ignore 
interruptions */ }
--- End diff --

 Does query killing cause a user exception ?



---


[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-20 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/938#discussion_r140062742
  
--- Diff: 
common/src/main/java/org/apache/drill/common/exceptions/UserException.java ---
@@ -536,6 +542,33 @@ public Builder pushContext(final String name, final 
double value) {
  * @return user exception
  */
 public UserException build(final Logger logger) {
+
+  // To allow for debugging:
+  // A spinner code to make the execution stop here while the file 
'/tmp/drillspin' exists
+  // Can be used to attach a debugger, use jstack, etc
+  // The processID of the spinning thread should be in a file like 
/tmp/spin4148663301172491613.tmp
+  // along with the error message.
+  File spinFile = new File("/tmp/drillspin");
--- End diff --

 Using a "flag file" instead of a config setting gives more flexibility; 
like no need to restart in order to turn this feature on/off, or can select to 
catch errors only in few nodes, and last -- can free the looping thread by 
deleting this "flag file". 
  I also plan on posting an announcement on the dev list about this new 
"feature", and see if there's any feedback. 



---


[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...

2017-09-20 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/805#discussion_r140057495
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1054,8 +1057,36 @@ public void setMax(Object max) {
   return nulls;
 }
 
-@Override public boolean hasSingleValue() {
-  return (max != null && min != null && max.equals(min));
+/**
+ * Checks that the column chunk has a single value.
+ * Returns {@code true} if {@code min} and {@code max} are the same 
but not null
+ * and nulls count is 0 or equal to the rows count.
+ * 
+ * Returns {@code true} if {@code min} and {@code max} are null and 
the number of null values
+ * in the column chunk is equal to the rows count.
+ * 
+ * Comparison of nulls and rows count is needed for the cases:
+ * 
+ * column with primitive type has single value and null values
+ *
+ * column with primitive type has only null values, min/max 
couldn't be null,
+ * but column has single value
+ * 
+ *
+ * @param rowCount rows count in column chunk
+ * @return true if column has single value
+ */
+@Override
+public boolean hasSingleValue(long rowCount) {
+  if (nulls != null) {
+if (min != null) {
+  // Objects.deepEquals() is used here, since min and max may be 
byte arrays
+  return Objects.deepEquals(min, max) && (nulls == 0 || nulls == 
rowCount);
--- End diff --

Statistics [1] for most parquet types use java primitive types to store min 
and max values, so min/max can not be null even if the table has null values.

[1] 
https://github.com/apache/parquet-mr/tree/e54ca615f213f5db6d34d9163c97eec98920d7a7/parquet-column/src/main/java/org/apache/parquet/column/statistics


---


[GitHub] drill issue #905: DRILL-1162: Fix OOM for hash join operator when the right ...

2017-09-20 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/905
  
@amansinha100, can you give this one a review? 


---


[GitHub] drill issue #944: DRILL-5425: Support HTTP Kerberos auth using SPNEGO

2017-09-20 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/944
  
@sohami, can you review this one? 


---


[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...

2017-09-20 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/805#discussion_r140055857
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -1054,8 +1057,36 @@ public void setMax(Object max) {
   return nulls;
 }
 
-@Override public boolean hasSingleValue() {
-  return (max != null && min != null && max.equals(min));
+/**
+ * Checks that the column chunk has a single value.
+ * Returns {@code true} if {@code min} and {@code max} are the same 
but not null
+ * and nulls count is 0 or equal to the rows count.
+ * 
+ * Returns {@code true} if {@code min} and {@code max} are null and 
the number of null values
+ * in the column chunk is equal to the rows count.
+ * 
+ * Comparison of nulls and rows count is needed for the cases:
+ * 
+ * column with primitive type has single value and null values
+ *
+ * column with primitive type has only null values, min/max 
couldn't be null,
+ * but column has single value
+ * 
+ *
+ * @param rowCount rows count in column chunk
+ * @return true if column has single value
+ */
+@Override
+public boolean hasSingleValue(long rowCount) {
+  if (nulls != null) {
+if (min != null) {
+  // Objects.deepEquals() is used here, since min and max may be 
byte arrays
+  return Objects.deepEquals(min, max) && (nulls == 0 || nulls == 
rowCount);
--- End diff --

- if (min != null), then nulls cannot be equal to rowCount
- In this case, only nulls == 0 should be checked 


---


[GitHub] drill issue #946: DRILL-5799: native-client: Support alternative build direc...

2017-09-20 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/946
  
Build issue has been corrected via another PR.


---


[GitHub] drill issue #919: DRILL-5721: Query with only root fragment and no non-root ...

2017-09-20 Thread sohami
Github user sohami commented on the issue:

https://github.com/apache/drill/pull/919
  
Rebased on latest master and squashed the initial 3 commits. But I have 
kept the commit to resolve conflict separate as there are some changes made 
w.r.t DRILL-3449 behavior, and added some new unit tests.


---


[GitHub] drill pull request #942: DRILL-5781: Fix unit test failures to use tests con...

2017-09-20 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/942#discussion_r140048587
  
--- Diff: contrib/storage-hbase/src/test/resources/hbase-site.xml ---
@@ -66,15 +66,13 @@
 Default is 10.
 
   

[GitHub] drill pull request #942: DRILL-5781: Fix unit test failures to use tests con...

2017-09-20 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/942#discussion_r139247294
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/ExecTest.java 
---
@@ -100,6 +101,14 @@ public void run() {
 return dir.getAbsolutePath() + File.separator + dirName;
   }
 
+  /**
+   * Sets zookeeper server and client SASL test config properties.
+   */
+  public static void setZookeeperSaslTestConfigProps() {
+System.setProperty(ZooKeeperSaslServer.LOGIN_CONTEXT_NAME_KEY, 
"Test_server");
--- End diff --

Maybe something like `DrillTestServerForUnitTests`, 
`DrillTestClientForUnitTests`.


---


[GitHub] drill pull request #942: DRILL-5781: Fix unit test failures to use tests con...

2017-09-20 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/942#discussion_r140048784
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/PathUtils.java ---
@@ -70,4 +72,14 @@ public static final String normalize(final String path) {
 return builder.toString();
   }
 
+  /**
+   * Creates and returns path with the protocol at the beginning from 
specified {@code url}.
+   */
--- End diff --

Can you please add java doc with @param and @return?


---


[GitHub] drill pull request #942: DRILL-5781: Fix unit test failures to use tests con...

2017-09-20 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/942#discussion_r139273842
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/ExecTest.java 
---
@@ -100,6 +101,14 @@ public void run() {
 return dir.getAbsolutePath() + File.separator + dirName;
   }
 
+  /**
+   * Sets zookeeper server and client SASL test config properties.
+   */
+  public static void setZookeeperSaslTestConfigProps() {
--- End diff --

Maybe it's possible to create separate test zk util class with this method 
and also setup for jaas property (so jaas config is not repeated twice in the 
code)  and keep it in the same package where we test zk?


---


[GitHub] drill pull request #950: Drill 5431: SSL Support

2017-09-20 Thread parthchandra
GitHub user parthchandra opened a pull request:

https://github.com/apache/drill/pull/950

Drill 5431: SSL Support

Add support for SSL between Java/C++ clients and Drillbits. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/parthchandra/drill DRILL-5431-0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/950.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #950


commit dd22a5a6630ebd87ecf35fb61fc44fcea830a4fa
Author: Sudheesh Katkam 
Date:   2017-05-16T21:48:57Z

DRILL-5431: Upgrade Netty to 4.0.47

commit a34ca452e391d88f64213fccc69e42f1fca91633
Author: Parth Chandra 
Date:   2017-06-20T21:13:53Z

DRILL-5431: SSL Support (Java) - Update DrillConfig to merge properties 
passed in from the client command line

commit 13f32d581fa01bc53d7580092ef3d1bbb500f4df
Author: Parth Chandra 
Date:   2017-07-25T16:21:02Z

DRILL-5431: SSL Support (Java) - Add test certificates, keys, keystore, and 
truststore.

commit f073001bfbbcf3bec20aae93636c139b7d98f6ec
Author: Parth Chandra 
Date:   2017-08-28T17:08:15Z

DRILL-5698: Revert unnecessary changes to C++ client

commit 759b5b201a9725f4b377590f48db30e0d5d58856
Author: Parth Chandra 
Date:   2017-06-16T23:49:45Z

DRILL-5431: Update POM to upgrade to Netty 4.0.48 and add exclusions to all 
modules that included older versions of Netty

commit 2f3b504e56fa0df704d8153b9c104da18e81d41d
Author: Parth Chandra 
Date:   2017-06-07T18:09:10Z

DRILL-5431: SSL Support (C++) - Refactoring of C++ client.

Move classes out of drillclient to their own files
Fix build on MacOS to suppress warnings from boost code
Refactoring of user properties to use a map

commit 999da4d9c063157aec8d5bd3583d4776652960c3
Author: Parth Chandra 
Date:   2017-06-10T05:03:59Z

DRILL-5431: SSL Support (Java) - Java client server SSL implementation

commit 9329306abed5b351226b0f25bf8a7f2ce5304679
Author: Parth Chandra 
Date:   2017-08-29T19:04:57Z

DRILL-5431: SSL Support (Java) - Enable OpenSSL support

commit ee75133198167c685e00183d3d34eca65fa43b09
Author: Parth Chandra 
Date:   2017-07-11T00:19:12Z

DRILL-5431: SSL Support (C++) - Add boost example code for ssl (small 
change to the code to pick up the certificate and key files from the test dir). 
Useful to test the ssl environment.

commit 95f609aa33e30d621108b8594360b9538374694e
Author: Parth Chandra 
Date:   2017-07-24T19:55:02Z

DRILL-5431: SSL Support (C++) - Update DrillClientImpl to use Channel 
implementation

commit 6d38f2dc0b4607727a77f491373d93ca9706724e
Author: Parth Chandra 
Date:   2017-07-25T16:22:23Z

DRILL-5431: SSL Support (C++) - Add (Netty like) socket abstraction that 
encapsulates a TCP socket or a SSL Stream on TCP.

The testSSL program tests the client connection against a drillbit by 
sending a drill handshake.

commit 23aac62331a9eb900fb5e6ca5e62ca62438ed9ec
Author: Parth Chandra 
Date:   2017-07-31T20:28:24Z

DRILL-5431: SSL Support (C++) - Fix Sasl on Windows to build from source 
(instead of install) directory




---


[GitHub] drill issue #948: DRILL-5745: Corrected 'location' information in Drill web ...

2017-09-20 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/948
  
+1, LGTM.


---


[GitHub] drill pull request #949: DRILL-5795: Parquet Filter push down at rowgroup le...

2017-09-20 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/949#discussion_r140036046
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -1095,7 +1104,7 @@ public GroupScan applyFilter(LogicalExpression 
filterExpr, UdfUtilities udfUtili
 
 final Set schemaPathsInExpr = filterExpr.accept(new 
ParquetRGFilterEvaluator.FieldReferenceFinder(), null);
 
-final List qualifiedRGs = new 
ArrayList<>(parquetTableMetadata.getFiles().size());
+final List qualifiedRGs = new 
ArrayList<>(rowGroupInfos.size());
--- End diff --

Never mind the previous comment. It's probably better to use RowGroupInfos 
throughout the code. 


---


[GitHub] drill pull request #949: DRILL-5795: Parquet Filter push down at rowgroup le...

2017-09-20 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/949#discussion_r140033471
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -819,63 +827,64 @@ private void init() throws IOException {
   }
 }
 rowGroupInfo.setEndpointByteMap(endpointByteMap);
+rowGroupInfo.setColumns(rg.getColumns());
 rgIndex++;
 rowGroupInfos.add(rowGroupInfo);
   }
 }
 
 this.endpointAffinities = 
AffinityCreator.getAffinityMap(rowGroupInfos);
+updatePartitionColTypeMap();
+  }
 
+  private void updatePartitionColTypeMap() {
 columnValueCounts = Maps.newHashMap();
 this.rowCount = 0;
 boolean first = true;
-for (ParquetFileMetadata file : parquetTableMetadata.getFiles()) {
-  for (RowGroupMetadata rowGroup : file.getRowGroups()) {
-long rowCount = rowGroup.getRowCount();
-for (ColumnMetadata column : rowGroup.getColumns()) {
-  SchemaPath schemaPath = 
SchemaPath.getCompoundPath(column.getName());
-  Long previousCount = columnValueCounts.get(schemaPath);
-  if (previousCount != null) {
-if (previousCount != GroupScan.NO_COLUMN_STATS) {
-  if (column.getNulls() != null) {
-Long newCount = rowCount - column.getNulls();
-columnValueCounts.put(schemaPath, 
columnValueCounts.get(schemaPath) + newCount);
-  }
-}
-  } else {
+for (RowGroupInfo rowGroup : this.rowGroupInfos) {
--- End diff --

Isn't this doing the same thing as the original code? RowGroupInfos is 
built from the RowGroupMetadata in the files?


---


[jira] [Created] (DRILL-5808) Reduce memory allocator strictness for "managed" operators

2017-09-20 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5808:
--

 Summary: Reduce memory allocator strictness for "managed" operators
 Key: DRILL-5808
 URL: https://issues.apache.org/jira/browse/DRILL-5808
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.11.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.12.0


Drill 1.11 and 1.12 introduce new "managed" versions of the sort and hash agg 
that enforce memory limits, spilling to disk when necessary.

Drill's internal memory system is very "lumpy" and unpredictable. The operators 
have no control over the incoming batch size; an overly large batch can cause 
the operator to exceed its memory limit before it has a chance to do any work.

Vector allocations grow in power-of-two sizes. Adding a single record can 
double the memory allocated to a vector.

Drill has no metadata, so operators cannot predict the size of VarChar columns 
nor the cardinality of arrays. The "Record Batch Sizer" tries to extract this 
information on each batch, but it works with averages, and specific column 
patterns can still throw off the memory calculations. (For example, having a 
series of very wide columns for A-M and very narrow columns for N-Z will cause 
a moderate average. But, once sorted, the A-M rows, and batches, will be much 
larger than expected, causing out-of-memory errors.)

At present, if an operator is wrong in its memory usage by a single byte, the 
entire query is killed. That is, the user pays the death penalty (of queries) 
for poor design decisions within Drill. This leads to a less-than-optimal user 
experience.

The proposal here is to make the memory allocator less strict for "managed" 
operators.

First, we recognize that the managed operators do attempt to control memory 
and, if designed well, will, on average hit their targets.

Second, we recognize that, due to the lumpiness issues above, any single 
operator may exceed, or be under, the configured maximum memory.

Given this, the proposal here is:

1. An operator identifies itself as managed to the memory allocator.
2. In managed mode, the allocator has soft limits. It emits a warning to the 
log when the limit is exceeded.
3. For safety, in managed mode, the allocator enforces a hard limit larger than 
the configured limit.

The enforcement limit might be:

* For memory sizes < 100MB, up to 2x the configured limit.
* For larger memory sizes, no more than 100MB over the configured limit.

The exact numbers can be made configurable.

Now, during testing, scripts should look for over-memory warnings. Each should 
be fixed as we fix OOM issues today. But, during production, user queries are 
far less likely to fail due to any remaining corner cases that throw off the 
memory calculations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Using Tableau to connect to DB engines using Calcite's JDBC driver

2017-09-20 Thread Laurent Goujon
AFAIK Tableau uses Drill ODBC driver, not JDBC (although Tableau hinted at
some JDBC support at some point: https://community.tableau.com/ideas/4633).

it's technically feasible BUT the Drill protocol is very low level so the
adapter would have to use the Drill RPC protocol and represent its own data
as DrillBuf (which is column oriented, not row oriented). Also, Tableau
seems to optimize things a bit for a specific DB: it doesn't query the
driver for what is supported and lots of things are hardcoded, which means
either the DB would have to replicate Drill dialect of SQL, or the
adaptation layer would have to translate.

Laurent

On Tue, Sep 19, 2017 at 3:17 PM, Muhammad Gelbana 
wrote:

> Tableau supports Apache Drill JDBC driver, so you basically can use Drill
> as a data provider for Tableau.
>
> I'm asking if anyone implemented a Calcite adapter for some data engine and
> tested if Tableau would be able to connect to it as if it was Apache Drill
> ?
>
> It's like you connect to that adapter by configuring an Apache Drill
> connection to it, through Tableau.
>
> Because otherwise, that data engine will need to have an ODBC driver, which
> is clearly a pain in the neck if you Google enough. That's actually what
> I'm trying to do. I need to implement a Calcite adapter to support a data
> engine but supporting Tableau is essential to our customers and I'd be very
> happy if I can avoid going through the Calcite ODBC driver path.
>
> I apologize if this sounds like a Calcite question but I believe Drill
> developers who worked on the JDBC driver can give a good insight.
>
> If you ask me, I believe Drill is all about Calcite in distributed mode :D,
> this may very well be so sketchy point of view but I'm not experienced with
> Drill or Calcite myself.
>
> Hopefully I explained my self clearly.
>
> Thanks,
> Gelbana
>


Re: Propose about join push down

2017-09-20 Thread weijie tong
Hi Boaz:

   Sorry for the wrong example. "select t2.a,t2.s,t3.d (select a, sum(b) as
s from t1 where c='1' group by a ) t2 join t3 on t2.a = t3.a"  this sql
would make sense.

The prerequisite for join push down is the storage plugin supports filter
push down. The corresponding rule should learn about this message to decide
whether to do the join push down (storage plugin like elastic search will
benefit from this).

I think there's little change to current hashjoin process logic except the
data pushing down work. 1st the build side table constructs the bloom
filter. 2. The hashjoin batch pushes down the bloom filter. 3 The things
left behaves the same as current implementation to do the join work between
filtered probing data and the build side ones.

One thing explicitly is to implement next call with data parameters . I
will think about this.

On Wed, 20 Sep 2017 at 5:25 AM Boaz Ben-Zvi  wrote:

> Hi Weijie,
>
> Are there some typos in the sample query ?  Looks like the projection
> should be t2.a,t2.s,t3.d (i.e., t2 instead of t1). Also the predicate “
> where a='1' ” makes the inner query return only a single row, which is
> pretty trivial.
>
> Assuming these changes are made, then there could be many t2 “a”
> values to be equi-joined to t3’s “a” values.
>
> With Bloom filters, the rows from t3 would only be “mostly filtered”;
> there still needs to be a join above to produce the final result.
>
> If wanting to push the “whole join” down, then _either_ need to have some
> index mechanism on “t3.a” – which would work as a nested loop join (NLJ),
> _or_ need to perform another type of join down below (with all related
> issues, like memory control, spill etc).  For the NLJ, indeed the current
> Drill does not support “down flow” of data (and most storage does not have
> indexes), and it’ll take some work to implement (e.g., all operators would
> need to accept a next() call with some “data” parameter).
>
>  Boaz
> 
>
> On 9/19/17, 8:45 AM, "weijie tong"  wrote:
>
> All:
>This is a propose about join query tuning by pushing down the join
> condition. Welcome suggestion ,discussion,objection .
>
>Suppose we have a join query "select t1.a,t1.s,t3.d (select a,
> sum(b) as
> s from t1 where a='1' group by a ) t2 join t3 on t2.a = t3.a"  .  This
> query will be transferred to a hashjoin or boradcast hashjoin (if
> metadata
> is accurate ). But the t3's rows will all be pulled out from the
> storage.
> If the t3 is a large table,the performance will be unacceptable.
> If we can first get the 'a' result set of the inner query,then we
> pushed
> down the result set to the right table t3's scan node. The right
> table's
> scan will be quickly.
>
>  possible solutions :
>  1. A new physical operator or  broadcast join ,hash join
> enhancements
> , which need to first query the left table's data, then push down the
> filtered left join condition column set to the right table stream, once
> confirmed the pushed down , works as normal join query logic.
>  2. The pushed down join condition set maybe two possible formats
> bloom
> filters bytes  or list of strings.
>  3. RecordBatch needs to support to push down 2's data down stream.
>  4. SubScan needs to hold the 2's data,and wait for next real call
> to
> push down to the storage level query.
>  5. Storage level should have an interface to indicate whether it
> supports to solve the pushed down bloom filter or list of strings.
>
>  Since this violates drill's data flow direction,it seems a lot of
> work
> to do ,to change to implement this feature.
>
>
>


Re: Propose about join push down

2017-09-20 Thread weijie tong
Hi Boaz:

Sorry for the wrong example,it should be
 "select t2.a,t2.s,t3.d (select a, sum(b) as s from t1 where c='1' group by
a ) t2 join t3 on t2.a = t3.a" which would make sense.

The prerequisite for pushing down join is the storage plugin support filter
push down. The storage plugin should add a interface to indicate it
supports join push down. The corresponding rule will care about this.

I think this strategy also applies to hashjoin. The build side table's join
keys construct the bloom filter firstly. Then it pushs down the bloom
filter down (next call with data parameters).All other things left are the
same process logic as the current hashjoin implementation.


On Wed, 20 Sep 2017 at 5:25 AM Boaz Ben-Zvi  wrote:

> Hi Weijie,
>
> Are there some typos in the sample query ?  Looks like the projection
> should be t2.a,t2.s,t3.d (i.e., t2 instead of t1). Also the predicate “
> where a='1' ” makes the inner query return only a single row, which is
> pretty trivial.
>
> Assuming these changes are made, then there could be many t2 “a”
> values to be equi-joined to t3’s “a” values.
>
> With Bloom filters, the rows from t3 would only be “mostly filtered”;
> there still needs to be a join above to produce the final result.
>
> If wanting to push the “whole join” down, then _either_ need to have some
> index mechanism on “t3.a” – which would work as a nested loop join (NLJ),
> _or_ need to perform another type of join down below (with all related
> issues, like memory control, spill etc).  For the NLJ, indeed the current
> Drill does not support “down flow” of data (and most storage does not have
> indexes), and it’ll take some work to implement (e.g., all operators would
> need to accept a next() call with some “data” parameter).
>
>  Boaz
> 
>
> On 9/19/17, 8:45 AM, "weijie tong"  wrote:
>
> All:
>This is a propose about join query tuning by pushing down the join
> condition. Welcome suggestion ,discussion,objection .
>
>Suppose we have a join query "select t1.a,t1.s,t3.d (select a,
> sum(b) as
> s from t1 where a='1' group by a ) t2 join t3 on t2.a = t3.a"  .  This
> query will be transferred to a hashjoin or boradcast hashjoin (if
> metadata
> is accurate ). But the t3's rows will all be pulled out from the
> storage.
> If the t3 is a large table,the performance will be unacceptable.
> If we can first get the 'a' result set of the inner query,then we
> pushed
> down the result set to the right table t3's scan node. The right
> table's
> scan will be quickly.
>
>  possible solutions :
>  1. A new physical operator or  broadcast join ,hash join
> enhancements
> , which need to first query the left table's data, then push down the
> filtered left join condition column set to the right table stream, once
> confirmed the pushed down , works as normal join query logic.
>  2. The pushed down join condition set maybe two possible formats
> bloom
> filters bytes  or list of strings.
>  3. RecordBatch needs to support to push down 2's data down stream.
>  4. SubScan needs to hold the 2's data,and wait for next real call
> to
> push down to the storage level query.
>  5. Storage level should have an interface to indicate whether it
> supports to solve the pushed down bloom filter or list of strings.
>
>  Since this violates drill's data flow direction,it seems a lot of
> work
> to do ,to change to implement this feature.
>
>
>


Re: Drill 2.0 (design) hackathon

2017-09-20 Thread AnilKumar B
Thanks All, it is really helpful.

On Wed, Sep 20, 2017 at 8:13 AM Charles Givre  wrote:

> Thank you Aman for organizing and to MapR for hosting!
>
> On Wed, Sep 20, 2017 at 11:12 AM, Aman Sinha  wrote:
>
> > Thanks to all the folks who attended the hackathon - both local and
> remote.
> >   For the remote attendees, you missed out on a good dinner :)
> >
> > We had a day of excellent discussion on several topics:  Resource
> > management, operator level performance improvements, TPC-DS coverage,
> > metadata management, concurrency, usability and error handling, storage
> > plugins + rest APIs.   It will take a couple of days to compile all the
> > notes and we will post them.
> >
> > Since the focus was more in-depth discussion rather than breadth, and 1
> day
> > is clearly not adequate, some topics were left out.  We can continue
> those
> > discussions on the dev list / hangout  or if it can wait, possibly do it
> in
> > a future hackathon.
> >
> > -Aman
> >
> > On Fri, Sep 15, 2017 at 2:54 PM, Charles Givre  wrote:
> >
> > > Hi Pritesh,
> > > What time do you think you’d want me to present?  Also, should I make
> > some
> > > slides?
> > > Best,
> > > — C
> > >
> > > > On Sep 15, 2017, at 13:23, Pritesh Maker  wrote:
> > > >
> > > > Hi All
> > > >
> > > > We are looking forward to hosting the hackathon on Monday. Just a few
> > > updates on the logistics and agenda
> > > >
> > > > • We are expecting over 25 people attending the event – you can see
> the
> > > attendee list at the Eventbrite site -  https://www.eventbrite.com/e/
> > > drill-developer-day-sept-2017-registration-7478463285
> > > >
> > > > • Breakfast will be served starting at 8:30AM – we would like to
> begin
> > > promptly at 9AM
> > > >
> > > > • The agenda has been updated to reflect the speakers (see the update
> > in
> > > the sheet - https://docs.google.com/spreadsheets/d/
> > > 1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 )
> > > > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman Sinha
> > > > o Community Contributions – Anil Kumar, John Omernik, Charles Givre
> and
> > > Ted Dunning
> > > > o Two tracks for technical design discussions – some topics have
> > initial
> > > thoughts for the topics and some will have open brainstorming
> discussions
> > > > o Once the discussions are concluded, we will have summaries
> presented
> > > and notes shared with the community
> > > >
> > > > • We will have a WebEx for the first two sessions. For the two
> tracks,
> > > we will either continue the WebEx or have Hangout links (will publish
> > them
> > > to the google sheet)
> > > > "JOIN WEBEX MEETING
> > > >
> https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c6
> > c76
> > > > Meeting number (access code): 806 111 950
> > > > Meeting password: ApacheDrill"
> > > >
> > > > • For the attendees in person, we have made bookings for a dinner in
> > the
> > > evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas
> > > >
> > > > Looking forward to a fantastic day for the Apache Drill! community!
> > > >
> > > > Thanks,
> > > > Pritesh
> > > >
> > > >
> > > >
> > > > On 9/5/17, 10:47 PM, "Aman Sinha"  wrote:
> > > >
> > > >Here is the Eventbrite event for registration:
> > > >
> > > >https://www.eventbrite.com/e/drill-developer-day-sept-2017-
> > > registration-7478463285
> > > >
> > > >Please register so we can plan for food and drinks appropriately.
> > > >
> > > >The link also contains a google doc link for the preliminary
> agenda
> > > and a
> > > >'Topics' tab with volunteer sign-up column.  Please add your name
> to
> > > the
> > > >area(s) of interest.
> > > >
> > > >Thanks and look forward to seeing you all !
> > > >
> > > >-Aman
> > > >
> > > >On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers 
> > > wrote:
> > > >
> > > >> A partial list of Drill’s public APIs:
> > > >>
> > > >> IMHO, highest priority for Drill 2.0.
> > > >>
> > > >>
> > > >>  *   JDBC/ODBC drivers
> > > >>  *   Client (for JDBC/ODBC) + ODBC & JDBC
> > > >>  *   Client (for full Drill async, columnar)
> > > >>  *   Storage plugin
> > > >>  *   Format plugin
> > > >>  *   System/session options
> > > >>  *   Queueing (e.g. ZK-based queues)
> > > >>  *   Rest API
> > > >>  *   Resource Planning (e.g. max query memory per node)
> > > >>  *   Metadata access, storage (e.g. file system locations vs. a
> > > metastore)
> > > >>  *   Metadata files formats (Parquet, views, etc.)
> > > >>
> > > >> Lower priority for future releases:
> > > >>
> > > >>
> > > >>  *   Query Planning (e.g. Calcite rules)
> > > >>  *   Config options
> > > >>  *   SQL syntax, especially Drill extensions
> > > >>  *   UDF
> > > >>  *   Management (e.g. JMX, Rest API calls, etc.)
> > > >>  *   Drill File System (HDFS)
> > > >>  *   Web UI
> > > >>  *   Shell scripts
> > > >>
> > > >> There are 

Re: Drill 2.0 (design) hackathon

2017-09-20 Thread Charles Givre
Thank you Aman for organizing and to MapR for hosting!

On Wed, Sep 20, 2017 at 11:12 AM, Aman Sinha  wrote:

> Thanks to all the folks who attended the hackathon - both local and remote.
>   For the remote attendees, you missed out on a good dinner :)
>
> We had a day of excellent discussion on several topics:  Resource
> management, operator level performance improvements, TPC-DS coverage,
> metadata management, concurrency, usability and error handling, storage
> plugins + rest APIs.   It will take a couple of days to compile all the
> notes and we will post them.
>
> Since the focus was more in-depth discussion rather than breadth, and 1 day
> is clearly not adequate, some topics were left out.  We can continue those
> discussions on the dev list / hangout  or if it can wait, possibly do it in
> a future hackathon.
>
> -Aman
>
> On Fri, Sep 15, 2017 at 2:54 PM, Charles Givre  wrote:
>
> > Hi Pritesh,
> > What time do you think you’d want me to present?  Also, should I make
> some
> > slides?
> > Best,
> > — C
> >
> > > On Sep 15, 2017, at 13:23, Pritesh Maker  wrote:
> > >
> > > Hi All
> > >
> > > We are looking forward to hosting the hackathon on Monday. Just a few
> > updates on the logistics and agenda
> > >
> > > • We are expecting over 25 people attending the event – you can see the
> > attendee list at the Eventbrite site -  https://www.eventbrite.com/e/
> > drill-developer-day-sept-2017-registration-7478463285
> > >
> > > • Breakfast will be served starting at 8:30AM – we would like to begin
> > promptly at 9AM
> > >
> > > • The agenda has been updated to reflect the speakers (see the update
> in
> > the sheet - https://docs.google.com/spreadsheets/d/
> > 1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 )
> > > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman Sinha
> > > o Community Contributions – Anil Kumar, John Omernik, Charles Givre and
> > Ted Dunning
> > > o Two tracks for technical design discussions – some topics have
> initial
> > thoughts for the topics and some will have open brainstorming discussions
> > > o Once the discussions are concluded, we will have summaries presented
> > and notes shared with the community
> > >
> > > • We will have a WebEx for the first two sessions. For the two tracks,
> > we will either continue the WebEx or have Hangout links (will publish
> them
> > to the google sheet)
> > > "JOIN WEBEX MEETING
> > > https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c6
> c76
> > > Meeting number (access code): 806 111 950
> > > Meeting password: ApacheDrill"
> > >
> > > • For the attendees in person, we have made bookings for a dinner in
> the
> > evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas
> > >
> > > Looking forward to a fantastic day for the Apache Drill! community!
> > >
> > > Thanks,
> > > Pritesh
> > >
> > >
> > >
> > > On 9/5/17, 10:47 PM, "Aman Sinha"  wrote:
> > >
> > >Here is the Eventbrite event for registration:
> > >
> > >https://www.eventbrite.com/e/drill-developer-day-sept-2017-
> > registration-7478463285
> > >
> > >Please register so we can plan for food and drinks appropriately.
> > >
> > >The link also contains a google doc link for the preliminary agenda
> > and a
> > >'Topics' tab with volunteer sign-up column.  Please add your name to
> > the
> > >area(s) of interest.
> > >
> > >Thanks and look forward to seeing you all !
> > >
> > >-Aman
> > >
> > >On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers 
> > wrote:
> > >
> > >> A partial list of Drill’s public APIs:
> > >>
> > >> IMHO, highest priority for Drill 2.0.
> > >>
> > >>
> > >>  *   JDBC/ODBC drivers
> > >>  *   Client (for JDBC/ODBC) + ODBC & JDBC
> > >>  *   Client (for full Drill async, columnar)
> > >>  *   Storage plugin
> > >>  *   Format plugin
> > >>  *   System/session options
> > >>  *   Queueing (e.g. ZK-based queues)
> > >>  *   Rest API
> > >>  *   Resource Planning (e.g. max query memory per node)
> > >>  *   Metadata access, storage (e.g. file system locations vs. a
> > metastore)
> > >>  *   Metadata files formats (Parquet, views, etc.)
> > >>
> > >> Lower priority for future releases:
> > >>
> > >>
> > >>  *   Query Planning (e.g. Calcite rules)
> > >>  *   Config options
> > >>  *   SQL syntax, especially Drill extensions
> > >>  *   UDF
> > >>  *   Management (e.g. JMX, Rest API calls, etc.)
> > >>  *   Drill File System (HDFS)
> > >>  *   Web UI
> > >>  *   Shell scripts
> > >>
> > >> There are certainly more. Please suggest those that are missing. I’ve
> > >> taken a rough cut at which APIs need forward/backward compatibility
> > first,
> > >> in part based on those that are the “most public” and most likely to
> > >> change. Others are important, but we can’t do them all at once.
> > >>
> > >> Thanks,
> > >>
> > >> - Paul
> > >>
> > >> On Aug 29, 2017, at 6:00 PM, Aman Sinha 

Re: Drill 2.0 (design) hackathon

2017-09-20 Thread Aman Sinha
Thanks to all the folks who attended the hackathon - both local and remote.
  For the remote attendees, you missed out on a good dinner :)

We had a day of excellent discussion on several topics:  Resource
management, operator level performance improvements, TPC-DS coverage,
metadata management, concurrency, usability and error handling, storage
plugins + rest APIs.   It will take a couple of days to compile all the
notes and we will post them.

Since the focus was more in-depth discussion rather than breadth, and 1 day
is clearly not adequate, some topics were left out.  We can continue those
discussions on the dev list / hangout  or if it can wait, possibly do it in
a future hackathon.

-Aman

On Fri, Sep 15, 2017 at 2:54 PM, Charles Givre  wrote:

> Hi Pritesh,
> What time do you think you’d want me to present?  Also, should I make some
> slides?
> Best,
> — C
>
> > On Sep 15, 2017, at 13:23, Pritesh Maker  wrote:
> >
> > Hi All
> >
> > We are looking forward to hosting the hackathon on Monday. Just a few
> updates on the logistics and agenda
> >
> > • We are expecting over 25 people attending the event – you can see the
> attendee list at the Eventbrite site -  https://www.eventbrite.com/e/
> drill-developer-day-sept-2017-registration-7478463285
> >
> > • Breakfast will be served starting at 8:30AM – we would like to begin
> promptly at 9AM
> >
> > • The agenda has been updated to reflect the speakers (see the update in
> the sheet - https://docs.google.com/spreadsheets/d/
> 1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 )
> > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman Sinha
> > o Community Contributions – Anil Kumar, John Omernik, Charles Givre and
> Ted Dunning
> > o Two tracks for technical design discussions – some topics have initial
> thoughts for the topics and some will have open brainstorming discussions
> > o Once the discussions are concluded, we will have summaries presented
> and notes shared with the community
> >
> > • We will have a WebEx for the first two sessions. For the two tracks,
> we will either continue the WebEx or have Hangout links (will publish them
> to the google sheet)
> > "JOIN WEBEX MEETING
> > https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c6c76
> > Meeting number (access code): 806 111 950
> > Meeting password: ApacheDrill"
> >
> > • For the attendees in person, we have made bookings for a dinner in the
> evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas
> >
> > Looking forward to a fantastic day for the Apache Drill! community!
> >
> > Thanks,
> > Pritesh
> >
> >
> >
> > On 9/5/17, 10:47 PM, "Aman Sinha"  wrote:
> >
> >Here is the Eventbrite event for registration:
> >
> >https://www.eventbrite.com/e/drill-developer-day-sept-2017-
> registration-7478463285
> >
> >Please register so we can plan for food and drinks appropriately.
> >
> >The link also contains a google doc link for the preliminary agenda
> and a
> >'Topics' tab with volunteer sign-up column.  Please add your name to
> the
> >area(s) of interest.
> >
> >Thanks and look forward to seeing you all !
> >
> >-Aman
> >
> >On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers 
> wrote:
> >
> >> A partial list of Drill’s public APIs:
> >>
> >> IMHO, highest priority for Drill 2.0.
> >>
> >>
> >>  *   JDBC/ODBC drivers
> >>  *   Client (for JDBC/ODBC) + ODBC & JDBC
> >>  *   Client (for full Drill async, columnar)
> >>  *   Storage plugin
> >>  *   Format plugin
> >>  *   System/session options
> >>  *   Queueing (e.g. ZK-based queues)
> >>  *   Rest API
> >>  *   Resource Planning (e.g. max query memory per node)
> >>  *   Metadata access, storage (e.g. file system locations vs. a
> metastore)
> >>  *   Metadata files formats (Parquet, views, etc.)
> >>
> >> Lower priority for future releases:
> >>
> >>
> >>  *   Query Planning (e.g. Calcite rules)
> >>  *   Config options
> >>  *   SQL syntax, especially Drill extensions
> >>  *   UDF
> >>  *   Management (e.g. JMX, Rest API calls, etc.)
> >>  *   Drill File System (HDFS)
> >>  *   Web UI
> >>  *   Shell scripts
> >>
> >> There are certainly more. Please suggest those that are missing. I’ve
> >> taken a rough cut at which APIs need forward/backward compatibility
> first,
> >> in part based on those that are the “most public” and most likely to
> >> change. Others are important, but we can’t do them all at once.
> >>
> >> Thanks,
> >>
> >> - Paul
> >>
> >> On Aug 29, 2017, at 6:00 PM, Aman Sinha > mansi...@apache.org>> wrote:
> >>
> >> Hi Paul,
> >> certainly makes sense to have the API compatibility discussions during
> this
> >> hackathon.  The 2.0 release may be a good checkpoint to introduce
> breaking
> >> changes necessitating changes to the ODBC/JDBC drivers and other
> external
> >> applications. As part of this exercise (not during the hackathon but as
> a
> >> 

[jira] [Created] (DRILL-5807) ambiguous error

2017-09-20 Thread XiaHang (JIRA)
XiaHang created DRILL-5807:
--

 Summary: ambiguous error
 Key: DRILL-5807
 URL: https://issues.apache.org/jira/browse/DRILL-5807
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Affects Versions: 1.11.0
 Environment: Linux
Reporter: XiaHang
Priority: Critical


if the final plan like below , JdbcFilter is below a JdbcJoin and above another 
JdbcJoin . 

JdbcProject(order_id=[$0], mord_id=[$6], item_id=[$2], div_pay_amt=[$5], 
item_quantity=[$4], slr_id=[$11]): rowcount = 5625.0, cumulative cost = 
{12540.0 rows, 29763.0 cpu, 0.0 io}, id = 327
JdbcJoin(condition=[=($3, $11)], joinType=[left]): rowcount = 5625.0, 
cumulative cost = {8040.0 rows, 2763.0 cpu, 0.0 io}, id = 325
  JdbcFilter(condition=[OR(AND(OR(IS NOT NULL($7), >($5, 0)), =($1, 2), 
OR(AND(=($10, '箱包皮具/热销女包/男包'), >(/($5, $4), 1000)), AND(OR(=($10, '家装主材'), 
=($10, '大家电')), >(/($5, $4), 1000)), AND(OR(=($10, '珠宝/钻石/翡翠/黄金'), =($10, 
'饰品/流行首饰/时尚饰品新')), >(/($5, $4), 2000)), AND(>(/($5, $4), 500), <>($10, 
'箱包皮具/热销女包/男包'), <>($10, '家装主材'), <>($10, '大家电'), <>($10, '珠宝/钻石/翡翠/黄金'), 
<>($10, '饰品/流行首饰/时尚饰品新'))), <>($10, '成人用品/情趣用品'), <>($10, '鲜花速递/花卉仿真/绿植园艺'), 
<>($10, '水产肉类/新鲜蔬果/熟食')), AND(<=(-(EXTRACT(FLAG(EPOCH), CURRENT_TIMESTAMP), 
EXTRACT(FLAG(EPOCH), CAST($8):TIMESTAMP(0))), *(*(*(14, 24), 60), 60)), 
OR(AND(OR(=($10, '箱包皮具/热销女包/男包'), =($10, '家装主材'), =($10, '大家电'), =($10, 
'珠宝/钻石/翡翠/黄金'), =($10, '饰品/流行首饰/时尚饰品新')), >(/($5, $4), 2000)), AND(OR(=($10, 
'男装'), =($10, '女装/女士精品'), =($10, '办公设备/耗材/相关服务')), >(/($5, $4), 1000)), 
AND(OR(=($10, '流行男鞋'), =($10, '女鞋')), >(/($5, $4), 1500))), IS NOT NULL($8)), 
AND(>=(-(EXTRACT(FLAG(EPOCH), CURRENT_TIMESTAMP), EXTRACT(FLAG(EPOCH), 
CAST($8):TIMESTAMP(0))), *(*(*(15, 24), 60), 60)), <=(-(EXTRACT(FLAG(EPOCH), 
CURRENT_TIMESTAMP), EXTRACT(FLAG(EPOCH), CAST($8):TIMESTAMP(0))), *(*(*(60, 
24), 60), 60)), OR(AND(OR(=($10, '箱包皮具/热销女包/男包'), =($10, '珠宝/钻石/翡翠/黄金'), =($10, 
'饰品/流行首饰/时尚饰品新')), >(/($5, $4), 5000)), AND(OR(=($10, '男装'), =($10, 
'女装/女士精品')), >(/($5, $4), 3000)), AND(OR(=($10, '流行男鞋'), =($10, '女鞋')), >(/($5, 
$4), 2500)), AND(=($10, '办公设备/耗材/相关服务'), >(/($5, $4), 2000))), IS NOT 
NULL($8)))]): rowcount = 375.0, cumulative cost = {2235.0 rows, 2582.0 cpu, 0.0 
io}, id = 320
JdbcJoin(condition=[=($2, $9)], joinType=[left]): rowcount = 1500.0, 
cumulative cost = {1860.0 rows, 1082.0 cpu, 0.0 io}, id = 318
  JdbcProject(order_id=[$0], pay_status=[$2], item_id=[$3], 
seller_id=[$5], item_quantity=[$7], div_pay_amt=[$20], mord_id=[$1], 
pay_time=[$19], succ_time=[$52]): rowcount = 100.0, cumulative cost = {180.0 
rows, 821.0 cpu, 0.0 io}, id = 313
JdbcTableScan(table=[[public, dws_tb_crm_u2_ord_base_df]]): 
rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 29
  JdbcProject(item_id=[$0], cate_level1_name=[$47]): rowcount = 100.0, 
cumulative cost = {180.0 rows, 261.0 cpu, 0.0 io}, id = 316
JdbcTableScan(table=[[public, dws_tb_crm_u2_itm_base_df]]): 
rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 46
  JdbcProject(slr_id=[$3]): rowcount = 100.0, cumulative cost = {180.0 
rows, 181.0 cpu, 0.0 io}, id = 323
JdbcTableScan(table=[[public, dws_tb_crm_u2_slr_base]]): rowcount = 
100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 68

the sql is converted to 
SELECT "t1"."order_id", "t1"."mord_id", "t1"."item_id", "t1"."div_pay_amt", 
"t1"."item_quantity", "t2"."slr_id"
FROM (SELECT *
FROM (SELECT "order_id", "pay_status", "item_id", "seller_id", "item_quantity", 
"div_pay_amt", "mord_id", "pay_time", "succ_time"
FROM "dws_tb_crm_u2_ord_base_df") AS "t"
LEFT JOIN (SELECT "item_id", "cate_level1_name"
FROM "dws_tb_crm_u2_itm_base_df") AS "t0" ON "t"."item_id" = "t0"."item_id"
WHERE ("t"."pay_time" IS NOT NULL OR "t"."div_pay_amt" > 0) AND 
"t"."pay_status" = 2 AND ("t0"."cate_level1_name" = '箱包皮具/热销女包/男包' AND 
"t"."div_pay_amt" / "t"."item_quantity" > 1000 OR ("t0"."cate_level1_name" = 
'家装主材' OR "t0"."cate_level1_name" = '大家电') AND "t"."div_pay_amt" / 
"t"."item_quantity" > 1000 OR ("t0"."cate_level1_name" = '珠宝/钻石/翡翠/黄金' OR 
"t0"."cate_level1_name" = '饰品/流行首饰/时尚饰品新') AND "t"."div_pay_amt" / 
"t"."item_quantity" > 2000 OR "t"."div_pay_amt" / "t"."item_quantity" > 500 AND 
"t0"."cate_level1_name" <> '箱包皮具/热销女包/男包' AND "t0"."cate_level1_name" <> '家装主材' 
AND "t0"."cate_level1_name" <> '大家电' AND "t0"."cate_level1_name" <> 
'珠宝/钻石/翡翠/黄金' AND "t0"."cate_level1_name" <> '饰品/流行首饰/时尚饰品新') AND 
"t0"."cate_level1_name" <> '成人用品/情趣用品' AND "t0"."cate_level1_name" <> 
'鲜花速递/花卉仿真/绿植园艺' AND "t0"."cate_level1_name" <> '水产肉类/新鲜蔬果/熟食' OR EXTRACT(EPOCH 
FROM CURRENT_TIMESTAMP) - EXTRACT(EPOCH FROM CAST("t"."succ_time" AS 
TIMESTAMP(0))) <= 14 * 24 * 60 * 60 AND (("t0"."cate_level1_name" = 
'箱包皮具/热销女包/男包' OR "t0"."cate_level1_name" = '家装主材' OR