[GitHub] drill issue #805: Drill-4139: Exception while trying to prune partition. jav...
Github user sachouche commented on the issue: https://github.com/apache/drill/pull/805 +1 ---
[GitHub] drill pull request #951: DRILL-5727: Update release profile to generate SHA-...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/951#discussion_r140129986 --- Diff: pom.xml --- @@ -977,6 +977,7 @@ MD5 SHA-1 +SHA-512 --- End diff -- Maybe we can remove SHA-1 usage? ---
[GitHub] drill pull request #950: DRILL-5431: SSL Support
Github user superbstreak commented on a diff in the pull request: https://github.com/apache/drill/pull/950#discussion_r140129825 --- Diff: contrib/native/client/src/include/drill/common.hpp --- @@ -163,9 +170,13 @@ typedef enum{ #define USERPROP_USERNAME "userName" #define USERPROP_PASSWORD "password" #define USERPROP_SCHEMA "schema" -#define USERPROP_USESSL "useSSL"// Not implemented yet -#define USERPROP_FILEPATH "pemLocation" // Not implemented yet -#define USERPROP_FILENAME "pemFile" // Not implemented yet +#define USERPROP_USESSL "enableTLS" +#define USERPROP_TLSPROTOCOL "TLSProtocol" //TLS version +#define USERPROP_CERTFILEPATH "certFilePath" // pem file path and name +#define USERPROP_CERTPASSWORD "certPassword" // Password for certificate file --- End diff -- I think we can remove this to avoid confusion :) ---
RE: Drill 2.0 (design) hackathon
I think that's a good idea. We could put this up in a list (in the google doc) of items to discuss on the hangout. That way, if we have no pressing topics to discuss, we can certainly pick something from the list . -Original Message- From: Aman Sinha [mailto:amansi...@apache.org] Sent: Wednesday, September 20, 2017 8:13 AM To: dev@drill.apache.org Subject: Re: Drill 2.0 (design) hackathon Thanks to all the folks who attended the hackathon - both local and remote. For the remote attendees, you missed out on a good dinner :) We had a day of excellent discussion on several topics: Resource management, operator level performance improvements, TPC-DS coverage, metadata management, concurrency, usability and error handling, storage plugins + rest APIs. It will take a couple of days to compile all the notes and we will post them. Since the focus was more in-depth discussion rather than breadth, and 1 day is clearly not adequate, some topics were left out. We can continue those discussions on the dev list / hangout or if it can wait, possibly do it in a future hackathon. -Aman On Fri, Sep 15, 2017 at 2:54 PM, Charles Givrewrote: > Hi Pritesh, > What time do you think you’d want me to present? Also, should I make > some slides? > Best, > — C > > > On Sep 15, 2017, at 13:23, Pritesh Maker wrote: > > > > Hi All > > > > We are looking forward to hosting the hackathon on Monday. Just a > > few > updates on the logistics and agenda > > > > • We are expecting over 25 people attending the event – you can see > > the > attendee list at the Eventbrite site - https://www.eventbrite.com/e/ > drill-developer-day-sept-2017-registration-7478463285 > > > > • Breakfast will be served starting at 8:30AM – we would like to > > begin > promptly at 9AM > > > > • The agenda has been updated to reflect the speakers (see the > > update in > the sheet - https://docs.google.com/spreadsheets/d/ > 1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 ) > > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman > > Sinha o Community Contributions – Anil Kumar, John Omernik, Charles > > Givre and > Ted Dunning > > o Two tracks for technical design discussions – some topics have > > initial > thoughts for the topics and some will have open brainstorming > discussions > > o Once the discussions are concluded, we will have summaries > > presented > and notes shared with the community > > > > • We will have a WebEx for the first two sessions. For the two > > tracks, > we will either continue the WebEx or have Hangout links (will publish > them to the google sheet) > > "JOIN WEBEX MEETING > > https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c > > 6c76 Meeting number (access code): 806 111 950 Meeting password: > > ApacheDrill" > > > > • For the attendees in person, we have made bookings for a dinner in > > the > evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas > > > > Looking forward to a fantastic day for the Apache Drill! community! > > > > Thanks, > > Pritesh > > > > > > > > On 9/5/17, 10:47 PM, "Aman Sinha" wrote: > > > >Here is the Eventbrite event for registration: > > > >https://www.eventbrite.com/e/drill-developer-day-sept-2017- > registration-7478463285 > > > >Please register so we can plan for food and drinks appropriately. > > > >The link also contains a google doc link for the preliminary > > agenda > and a > >'Topics' tab with volunteer sign-up column. Please add your name > > to > the > >area(s) of interest. > > > >Thanks and look forward to seeing you all ! > > > >-Aman > > > >On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers > wrote: > > > >> A partial list of Drill’s public APIs: > >> > >> IMHO, highest priority for Drill 2.0. > >> > >> > >> * JDBC/ODBC drivers > >> * Client (for JDBC/ODBC) + ODBC & JDBC > >> * Client (for full Drill async, columnar) > >> * Storage plugin > >> * Format plugin > >> * System/session options > >> * Queueing (e.g. ZK-based queues) > >> * Rest API > >> * Resource Planning (e.g. max query memory per node) > >> * Metadata access, storage (e.g. file system locations vs. a > metastore) > >> * Metadata files formats (Parquet, views, etc.) > >> > >> Lower priority for future releases: > >> > >> > >> * Query Planning (e.g. Calcite rules) > >> * Config options > >> * SQL syntax, especially Drill extensions > >> * UDF > >> * Management (e.g. JMX, Rest API calls, etc.) > >> * Drill File System (HDFS) > >> * Web UI > >> * Shell scripts > >> > >> There are certainly more. Please suggest those that are missing. > >> I’ve taken a rough cut at which APIs need forward/backward > >> compatibility > first, > >> in part based on those that are the “most public” and most likely > >> to change. Others are important, but we can’t do them all at once. > >> > >>
Added "spinner" code to allow debugging of failure cause
FYI and for feedback: As part of Pull Request #938 I added a “spinner” code in the build() method of the UserException class, such that when this method is called (i.e., before reporting of a failure to the user), that code can go into a looping spin (instead of continuing to termination). This can be useful when investigating the original failure, allowing to attach a debugger, or use jstack to see the stacks at this point of execution, or check some external things (like condition of the spill files at that point), etc. To trigger this feature ON, need to create (an empty) flag file named /tmp/drill/spin at every node where this stop-spinning needs to take place (e.g., use “clush –a touch /tmp/drill/spin” to set it all across the cluster). Once a thread hits this code, it checks for the existence of this spin file, and if exists, the thread creates a temp file named something like: /tmp/drill/spin4148663301172491613.tmp which contains its process ID (e.g., to allow jstack) and the error message, like: ~ 5 > cat /tmp/drill/spin5273075865809469794.tmp Spinning process: 16966@BBenZvi-E754-MBP13.local Error cause: SYSTEM ERROR: CannotPlanException: Node [rel#232:Subset#10.PHYSICAL.SINGLETON([]).[]] could not be implemented; planner state: Root: rel#232:Subset#10.PHYSICAL.SINGLETON([]).[] . . . . . . . ~ 6 > jstack 16966 Picked up JAVA_TOOL_OPTIONS: -ea 2017-09-20 17:15:21 Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode): "Attach Listener" #91 daemon prio=9 os_prio=31 tid=0x7fdd8830b000 nid=0x4f07 waiting on condition [0x] java.lang.Thread.State: RUNNABLE "263cfbd5-329d-b9fb-d96e-392e4fe0be4d:foreman" #53 daemon prio=10 os_prio=31 tid=0x7fdd8823a000 nid=0x7203 waiting on condition [0x72224000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:570) . . . . . . . . The spinning thread then loops – sleeps for a second and then rechecks that flag file. To turn this feature OFF and release the spinning threads one need to delete that empty spin files (e.g., use “clush –a rm /tmp/drill/spin”). This will also clean the relevant temp files. Hope this is useful, and welcome any feedback or suggestions. Boaz
[GitHub] drill pull request #951: DRILL-5727: Update release profile to generate SHA-...
GitHub user parthchandra opened a pull request: https://github.com/apache/drill/pull/951 DRILL-5727: Update release profile to generate SHA-512 checksum. New Apache release guidelines require a sha-512 checksum You can merge this pull request into a Git repository by running: $ git pull https://github.com/parthchandra/drill DRILL-5727 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/951.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #951 commit ed1d5508dbe70a7b58bbf36628325462644ed19e Author: Parth ChandraDate: 2017-09-20T20:42:54Z DRILL-5727: Update release profile to generate SHA-512 checksum. ---
[jira] [Resolved] (DRILL-5715) Performance of refactored HashAgg operator regressed
[ https://issues.apache.org/jira/browse/DRILL-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boaz Ben-Zvi resolved DRILL-5715. - Resolution: Fixed Reviewer: Paul Rogers The commit for DRILL-5694 (PR #938) also solves this performance bug (basically removed calls to Setup before every hash computation, plus few little changes like replacing setSafe with set ). > Performance of refactored HashAgg operator regressed > > > Key: DRILL-5715 > URL: https://issues.apache.org/jira/browse/DRILL-5715 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.11.0 > Environment: 10-node RHEL 6.4 (32 Core, 256GB RAM) >Reporter: Kunal Khatua >Assignee: Boaz Ben-Zvi > Labels: performance, regression > Fix For: 1.12.0 > > Attachments: 26736242-d084-6604-aac9-927e729da755.sys.drill, > 26736615-9e86-dac9-ad77-b022fd791f67.sys.drill, > 2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill, > 2675de42-3789-47b8-29e8-c5077af136db.sys.drill, drill-1.10.0_callTree.png, > drill-1.10.0_hotspot.png, drill-1.11.0_callTree.png, drill-1.11.0_hotspot.png > > > When running the following simple HashAgg-based query on a TPCH-table - > Lineitem with 6Billion rows on a 10 node setup (with a single partition to > disable any possible spilling to disk) > {code:sql} > select count(*) > from ( > select l_quantity > , count(l_orderkey) > from lineitem > group by l_quantity > ) {code} > the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the > JDBC client]. > To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was > modified to > {code}drill.exec.hashagg.num_partitions : 1{code} > Attached are two profiles > Drill 1.10.0 : [^2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill] > Drill 1.11.0 : [^2675de42-3789-47b8-29e8-c5077af136db.sys.drill] > A separate run was done for both scenarios with the > {{planner.width.max_per_node=10}} and profiled with YourKit. > Image snippets are attached, indicating the hotspots in both builds: > *Drill 1.10.0* : > Profile: [^26736242-d084-6604-aac9-927e729da755.sys.drill] > CallTree: [^drill-1.10.0_callTree.png] > HotSpot: [^drill-1.10.0_hotspot.png] > !drill-1.10.0_hotspot.png|drill-1.10.0_hotspot! > *Drill 1.11.0* : > Profile: [^26736615-9e86-dac9-ad77-b022fd791f67.sys.drill] > CallTree: [^drill-1.11.0_callTree.png] > HotSpot: [^drill-1.11.0_hotspot.png] > !drill-1.11.0_hotspot.png|drill-1.11.0_hotspot! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (DRILL-5740) hash agg fail to read spill file
[ https://issues.apache.org/jira/browse/DRILL-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boaz Ben-Zvi resolved DRILL-5740. - Resolution: Fixed Fix Version/s: 1.12.0 The commit for DRILL-5694 (PR #938) also solves this bug (basically removed an unneeded closing of the SpillSet). > hash agg fail to read spill file > > > Key: DRILL-5740 > URL: https://issues.apache.org/jira/browse/DRILL-5740 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.12.0 >Reporter: Chun Chang >Assignee: Boaz Ben-Zvi >Priority: Blocker > Fix For: 1.12.0 > > > -Build: | 1.12.0-SNAPSHOT | 11008d029bafa36279e3045c4ed1a64366080620 > -Multi-node drill cluster > Running a query causing hash agg spill fails with the following error. And > this seems to be a regression. > {noformat} > Execution Failures: > /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg5.q > Query: > select gby_date, gby_int32_rand, sum(int32_field), avg(float_field), > min(boolean_field), count(double_rand) from > dfs.`/drill/testdata/hagg/PARQUET-500M.parquet` group by gby_date, > gby_int32_rand order by gby_date, gby_int32_rand limit 30 > Failed with exception > java.sql.SQLException: SYSTEM ERROR: FileNotFoundException: File > /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3 > does not exist > Fragment 1:34 > [Error Id: 291a79f8-9b7a-485d-9404-e7b7fe1d8f1e on 10.10.30.168:31010] > (java.lang.RuntimeException) java.io.FileNotFoundException: File > /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3 > does not exist > > org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67 > > org.apache.drill.exec.test.generated.HashAggregatorGen1891.outputCurrentBatch():980 > org.apache.drill.exec.test.generated.HashAggregatorGen1891.doWork():617 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():415 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():227 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r140098512 --- Diff: common/src/main/java/org/apache/drill/common/exceptions/UserException.java --- @@ -536,6 +542,33 @@ public Builder pushContext(final String name, final double value) { * @return user exception */ public UserException build(final Logger logger) { + + // To allow for debugging: + // A spinner code to make the execution stop here while the file '/tmp/drillspin' exists --- End diff -- Done ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r140098546 --- Diff: common/src/main/java/org/apache/drill/common/exceptions/UserException.java --- @@ -536,6 +542,33 @@ public Builder pushContext(final String name, final double value) { * @return user exception */ public UserException build(final Logger logger) { + + // To allow for debugging: + // A spinner code to make the execution stop here while the file '/tmp/drillspin' exists + // Can be used to attach a debugger, use jstack, etc + // The processID of the spinning thread should be in a file like /tmp/spin4148663301172491613.tmp --- End diff -- Done ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r140093627 --- Diff: common/src/main/java/org/apache/drill/common/exceptions/UserException.java --- @@ -536,6 +542,33 @@ public Builder pushContext(final String name, final double value) { * @return user exception */ public UserException build(final Logger logger) { + + // To allow for debugging: + // A spinner code to make the execution stop here while the file '/tmp/drillspin' exists + // Can be used to attach a debugger, use jstack, etc + // The processID of the spinning thread should be in a file like /tmp/spin4148663301172491613.tmp + // along with the error message. + File spinFile = new File("/tmp/drillspin"); + if ( spinFile.exists() ) { +File tmpDir = new File("/tmp"); +File outErr = null; +try { + outErr = File.createTempFile("spin", ".tmp", tmpDir); + BufferedWriter bw = new BufferedWriter(new FileWriter(outErr)); + bw.write("Spinning process: " + ManagementFactory.getRuntimeMXBean().getName() + /* After upgrading to JDK 9 - replace with: ProcessHandle.current().getPid() */); + bw.write("\nError cause: " + +(errorType == DrillPBError.ErrorType.SYSTEM ? ("SYSTEM ERROR: " + ErrorHelper.getRootMessage(cause)) : message)); + bw.close(); +} catch (Exception ex) { + logger.warn("Failed creating a spinner tmp message file: {}", ex); +} +while (spinFile.exists()) { + try { sleep(1_000); } catch (Exception ex) { /* ignore interruptions */ } --- End diff -- Yes - if some non-blocked part tries to kill the query, the spinning parts would still be blocked - that may be by design, as debugging still goes on (until a user issues "clush -a rm /tmp/drill/spin" ) ---
[GitHub] drill issue #949: DRILL-5795: Parquet Filter push down at rowgroup level
Github user dprofeta commented on the issue: https://github.com/apache/drill/pull/949 I will add a unit test to test the number of rowgroups that are scanned by the groupscan to see if the filter is well able to prune rowgroup. ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r140062933 --- Diff: common/src/main/java/org/apache/drill/common/exceptions/UserException.java --- @@ -536,6 +542,33 @@ public Builder pushContext(final String name, final double value) { * @return user exception */ public UserException build(final Logger logger) { + + // To allow for debugging: + // A spinner code to make the execution stop here while the file '/tmp/drillspin' exists + // Can be used to attach a debugger, use jstack, etc + // The processID of the spinning thread should be in a file like /tmp/spin4148663301172491613.tmp + // along with the error message. + File spinFile = new File("/tmp/drillspin"); + if ( spinFile.exists() ) { +File tmpDir = new File("/tmp"); +File outErr = null; +try { + outErr = File.createTempFile("spin", ".tmp", tmpDir); + BufferedWriter bw = new BufferedWriter(new FileWriter(outErr)); + bw.write("Spinning process: " + ManagementFactory.getRuntimeMXBean().getName() + /* After upgrading to JDK 9 - replace with: ProcessHandle.current().getPid() */); + bw.write("\nError cause: " + +(errorType == DrillPBError.ErrorType.SYSTEM ? ("SYSTEM ERROR: " + ErrorHelper.getRootMessage(cause)) : message)); + bw.close(); +} catch (Exception ex) { + logger.warn("Failed creating a spinner tmp message file: {}", ex); +} +while (spinFile.exists()) { + try { sleep(1_000); } catch (Exception ex) { /* ignore interruptions */ } --- End diff -- Does query killing cause a user exception ? ---
[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/938#discussion_r140062742 --- Diff: common/src/main/java/org/apache/drill/common/exceptions/UserException.java --- @@ -536,6 +542,33 @@ public Builder pushContext(final String name, final double value) { * @return user exception */ public UserException build(final Logger logger) { + + // To allow for debugging: + // A spinner code to make the execution stop here while the file '/tmp/drillspin' exists + // Can be used to attach a debugger, use jstack, etc + // The processID of the spinning thread should be in a file like /tmp/spin4148663301172491613.tmp + // along with the error message. + File spinFile = new File("/tmp/drillspin"); --- End diff -- Using a "flag file" instead of a config setting gives more flexibility; like no need to restart in order to turn this feature on/off, or can select to catch errors only in few nodes, and last -- can free the looping thread by deleting this "flag file". I also plan on posting an announcement on the dev list about this new "feature", and see if there's any feedback. ---
[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...
Github user vvysotskyi commented on a diff in the pull request: https://github.com/apache/drill/pull/805#discussion_r140057495 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -1054,8 +1057,36 @@ public void setMax(Object max) { return nulls; } -@Override public boolean hasSingleValue() { - return (max != null && min != null && max.equals(min)); +/** + * Checks that the column chunk has a single value. + * Returns {@code true} if {@code min} and {@code max} are the same but not null + * and nulls count is 0 or equal to the rows count. + * + * Returns {@code true} if {@code min} and {@code max} are null and the number of null values + * in the column chunk is equal to the rows count. + * + * Comparison of nulls and rows count is needed for the cases: + * + * column with primitive type has single value and null values + * + * column with primitive type has only null values, min/max couldn't be null, + * but column has single value + * + * + * @param rowCount rows count in column chunk + * @return true if column has single value + */ +@Override +public boolean hasSingleValue(long rowCount) { + if (nulls != null) { +if (min != null) { + // Objects.deepEquals() is used here, since min and max may be byte arrays + return Objects.deepEquals(min, max) && (nulls == 0 || nulls == rowCount); --- End diff -- Statistics [1] for most parquet types use java primitive types to store min and max values, so min/max can not be null even if the table has null values. [1] https://github.com/apache/parquet-mr/tree/e54ca615f213f5db6d34d9163c97eec98920d7a7/parquet-column/src/main/java/org/apache/parquet/column/statistics ---
[GitHub] drill issue #905: DRILL-1162: Fix OOM for hash join operator when the right ...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/905 @amansinha100, can you give this one a review? ---
[GitHub] drill issue #944: DRILL-5425: Support HTTP Kerberos auth using SPNEGO
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/944 @sohami, can you review this one? ---
[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...
Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/805#discussion_r140055857 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java --- @@ -1054,8 +1057,36 @@ public void setMax(Object max) { return nulls; } -@Override public boolean hasSingleValue() { - return (max != null && min != null && max.equals(min)); +/** + * Checks that the column chunk has a single value. + * Returns {@code true} if {@code min} and {@code max} are the same but not null + * and nulls count is 0 or equal to the rows count. + * + * Returns {@code true} if {@code min} and {@code max} are null and the number of null values + * in the column chunk is equal to the rows count. + * + * Comparison of nulls and rows count is needed for the cases: + * + * column with primitive type has single value and null values + * + * column with primitive type has only null values, min/max couldn't be null, + * but column has single value + * + * + * @param rowCount rows count in column chunk + * @return true if column has single value + */ +@Override +public boolean hasSingleValue(long rowCount) { + if (nulls != null) { +if (min != null) { + // Objects.deepEquals() is used here, since min and max may be byte arrays + return Objects.deepEquals(min, max) && (nulls == 0 || nulls == rowCount); --- End diff -- - if (min != null), then nulls cannot be equal to rowCount - In this case, only nulls == 0 should be checked ---
[GitHub] drill issue #946: DRILL-5799: native-client: Support alternative build direc...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/946 Build issue has been corrected via another PR. ---
[GitHub] drill issue #919: DRILL-5721: Query with only root fragment and no non-root ...
Github user sohami commented on the issue: https://github.com/apache/drill/pull/919 Rebased on latest master and squashed the initial 3 commits. But I have kept the commit to resolve conflict separate as there are some changes made w.r.t DRILL-3449 behavior, and added some new unit tests. ---
[GitHub] drill pull request #942: DRILL-5781: Fix unit test failures to use tests con...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/942#discussion_r140048587 --- Diff: contrib/storage-hbase/src/test/resources/hbase-site.xml --- @@ -66,15 +66,13 @@ Default is 10.
[GitHub] drill pull request #942: DRILL-5781: Fix unit test failures to use tests con...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/942#discussion_r139247294 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/ExecTest.java --- @@ -100,6 +101,14 @@ public void run() { return dir.getAbsolutePath() + File.separator + dirName; } + /** + * Sets zookeeper server and client SASL test config properties. + */ + public static void setZookeeperSaslTestConfigProps() { +System.setProperty(ZooKeeperSaslServer.LOGIN_CONTEXT_NAME_KEY, "Test_server"); --- End diff -- Maybe something like `DrillTestServerForUnitTests`, `DrillTestClientForUnitTests`. ---
[GitHub] drill pull request #942: DRILL-5781: Fix unit test failures to use tests con...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/942#discussion_r140048784 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/PathUtils.java --- @@ -70,4 +72,14 @@ public static final String normalize(final String path) { return builder.toString(); } + /** + * Creates and returns path with the protocol at the beginning from specified {@code url}. + */ --- End diff -- Can you please add java doc with @param and @return? ---
[GitHub] drill pull request #942: DRILL-5781: Fix unit test failures to use tests con...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/942#discussion_r139273842 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/ExecTest.java --- @@ -100,6 +101,14 @@ public void run() { return dir.getAbsolutePath() + File.separator + dirName; } + /** + * Sets zookeeper server and client SASL test config properties. + */ + public static void setZookeeperSaslTestConfigProps() { --- End diff -- Maybe it's possible to create separate test zk util class with this method and also setup for jaas property (so jaas config is not repeated twice in the code) and keep it in the same package where we test zk? ---
[GitHub] drill pull request #950: Drill 5431: SSL Support
GitHub user parthchandra opened a pull request: https://github.com/apache/drill/pull/950 Drill 5431: SSL Support Add support for SSL between Java/C++ clients and Drillbits. You can merge this pull request into a Git repository by running: $ git pull https://github.com/parthchandra/drill DRILL-5431-0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/950.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #950 commit dd22a5a6630ebd87ecf35fb61fc44fcea830a4fa Author: Sudheesh KatkamDate: 2017-05-16T21:48:57Z DRILL-5431: Upgrade Netty to 4.0.47 commit a34ca452e391d88f64213fccc69e42f1fca91633 Author: Parth Chandra Date: 2017-06-20T21:13:53Z DRILL-5431: SSL Support (Java) - Update DrillConfig to merge properties passed in from the client command line commit 13f32d581fa01bc53d7580092ef3d1bbb500f4df Author: Parth Chandra Date: 2017-07-25T16:21:02Z DRILL-5431: SSL Support (Java) - Add test certificates, keys, keystore, and truststore. commit f073001bfbbcf3bec20aae93636c139b7d98f6ec Author: Parth Chandra Date: 2017-08-28T17:08:15Z DRILL-5698: Revert unnecessary changes to C++ client commit 759b5b201a9725f4b377590f48db30e0d5d58856 Author: Parth Chandra Date: 2017-06-16T23:49:45Z DRILL-5431: Update POM to upgrade to Netty 4.0.48 and add exclusions to all modules that included older versions of Netty commit 2f3b504e56fa0df704d8153b9c104da18e81d41d Author: Parth Chandra Date: 2017-06-07T18:09:10Z DRILL-5431: SSL Support (C++) - Refactoring of C++ client. Move classes out of drillclient to their own files Fix build on MacOS to suppress warnings from boost code Refactoring of user properties to use a map commit 999da4d9c063157aec8d5bd3583d4776652960c3 Author: Parth Chandra Date: 2017-06-10T05:03:59Z DRILL-5431: SSL Support (Java) - Java client server SSL implementation commit 9329306abed5b351226b0f25bf8a7f2ce5304679 Author: Parth Chandra Date: 2017-08-29T19:04:57Z DRILL-5431: SSL Support (Java) - Enable OpenSSL support commit ee75133198167c685e00183d3d34eca65fa43b09 Author: Parth Chandra Date: 2017-07-11T00:19:12Z DRILL-5431: SSL Support (C++) - Add boost example code for ssl (small change to the code to pick up the certificate and key files from the test dir). Useful to test the ssl environment. commit 95f609aa33e30d621108b8594360b9538374694e Author: Parth Chandra Date: 2017-07-24T19:55:02Z DRILL-5431: SSL Support (C++) - Update DrillClientImpl to use Channel implementation commit 6d38f2dc0b4607727a77f491373d93ca9706724e Author: Parth Chandra Date: 2017-07-25T16:22:23Z DRILL-5431: SSL Support (C++) - Add (Netty like) socket abstraction that encapsulates a TCP socket or a SSL Stream on TCP. The testSSL program tests the client connection against a drillbit by sending a drill handshake. commit 23aac62331a9eb900fb5e6ca5e62ca62438ed9ec Author: Parth Chandra Date: 2017-07-31T20:28:24Z DRILL-5431: SSL Support (C++) - Fix Sasl on Windows to build from source (instead of install) directory ---
[GitHub] drill issue #948: DRILL-5745: Corrected 'location' information in Drill web ...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/948 +1, LGTM. ---
[GitHub] drill pull request #949: DRILL-5795: Parquet Filter push down at rowgroup le...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/949#discussion_r140036046 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java --- @@ -1095,7 +1104,7 @@ public GroupScan applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtili final Set schemaPathsInExpr = filterExpr.accept(new ParquetRGFilterEvaluator.FieldReferenceFinder(), null); -final List qualifiedRGs = new ArrayList<>(parquetTableMetadata.getFiles().size()); +final List qualifiedRGs = new ArrayList<>(rowGroupInfos.size()); --- End diff -- Never mind the previous comment. It's probably better to use RowGroupInfos throughout the code. ---
[GitHub] drill pull request #949: DRILL-5795: Parquet Filter push down at rowgroup le...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/949#discussion_r140033471 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java --- @@ -819,63 +827,64 @@ private void init() throws IOException { } } rowGroupInfo.setEndpointByteMap(endpointByteMap); +rowGroupInfo.setColumns(rg.getColumns()); rgIndex++; rowGroupInfos.add(rowGroupInfo); } } this.endpointAffinities = AffinityCreator.getAffinityMap(rowGroupInfos); +updatePartitionColTypeMap(); + } + private void updatePartitionColTypeMap() { columnValueCounts = Maps.newHashMap(); this.rowCount = 0; boolean first = true; -for (ParquetFileMetadata file : parquetTableMetadata.getFiles()) { - for (RowGroupMetadata rowGroup : file.getRowGroups()) { -long rowCount = rowGroup.getRowCount(); -for (ColumnMetadata column : rowGroup.getColumns()) { - SchemaPath schemaPath = SchemaPath.getCompoundPath(column.getName()); - Long previousCount = columnValueCounts.get(schemaPath); - if (previousCount != null) { -if (previousCount != GroupScan.NO_COLUMN_STATS) { - if (column.getNulls() != null) { -Long newCount = rowCount - column.getNulls(); -columnValueCounts.put(schemaPath, columnValueCounts.get(schemaPath) + newCount); - } -} - } else { +for (RowGroupInfo rowGroup : this.rowGroupInfos) { --- End diff -- Isn't this doing the same thing as the original code? RowGroupInfos is built from the RowGroupMetadata in the files? ---
[jira] [Created] (DRILL-5808) Reduce memory allocator strictness for "managed" operators
Paul Rogers created DRILL-5808: -- Summary: Reduce memory allocator strictness for "managed" operators Key: DRILL-5808 URL: https://issues.apache.org/jira/browse/DRILL-5808 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.11.0 Reporter: Paul Rogers Assignee: Paul Rogers Fix For: 1.12.0 Drill 1.11 and 1.12 introduce new "managed" versions of the sort and hash agg that enforce memory limits, spilling to disk when necessary. Drill's internal memory system is very "lumpy" and unpredictable. The operators have no control over the incoming batch size; an overly large batch can cause the operator to exceed its memory limit before it has a chance to do any work. Vector allocations grow in power-of-two sizes. Adding a single record can double the memory allocated to a vector. Drill has no metadata, so operators cannot predict the size of VarChar columns nor the cardinality of arrays. The "Record Batch Sizer" tries to extract this information on each batch, but it works with averages, and specific column patterns can still throw off the memory calculations. (For example, having a series of very wide columns for A-M and very narrow columns for N-Z will cause a moderate average. But, once sorted, the A-M rows, and batches, will be much larger than expected, causing out-of-memory errors.) At present, if an operator is wrong in its memory usage by a single byte, the entire query is killed. That is, the user pays the death penalty (of queries) for poor design decisions within Drill. This leads to a less-than-optimal user experience. The proposal here is to make the memory allocator less strict for "managed" operators. First, we recognize that the managed operators do attempt to control memory and, if designed well, will, on average hit their targets. Second, we recognize that, due to the lumpiness issues above, any single operator may exceed, or be under, the configured maximum memory. Given this, the proposal here is: 1. An operator identifies itself as managed to the memory allocator. 2. In managed mode, the allocator has soft limits. It emits a warning to the log when the limit is exceeded. 3. For safety, in managed mode, the allocator enforces a hard limit larger than the configured limit. The enforcement limit might be: * For memory sizes < 100MB, up to 2x the configured limit. * For larger memory sizes, no more than 100MB over the configured limit. The exact numbers can be made configurable. Now, during testing, scripts should look for over-memory warnings. Each should be fixed as we fix OOM issues today. But, during production, user queries are far less likely to fail due to any remaining corner cases that throw off the memory calculations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Using Tableau to connect to DB engines using Calcite's JDBC driver
AFAIK Tableau uses Drill ODBC driver, not JDBC (although Tableau hinted at some JDBC support at some point: https://community.tableau.com/ideas/4633). it's technically feasible BUT the Drill protocol is very low level so the adapter would have to use the Drill RPC protocol and represent its own data as DrillBuf (which is column oriented, not row oriented). Also, Tableau seems to optimize things a bit for a specific DB: it doesn't query the driver for what is supported and lots of things are hardcoded, which means either the DB would have to replicate Drill dialect of SQL, or the adaptation layer would have to translate. Laurent On Tue, Sep 19, 2017 at 3:17 PM, Muhammad Gelbanawrote: > Tableau supports Apache Drill JDBC driver, so you basically can use Drill > as a data provider for Tableau. > > I'm asking if anyone implemented a Calcite adapter for some data engine and > tested if Tableau would be able to connect to it as if it was Apache Drill > ? > > It's like you connect to that adapter by configuring an Apache Drill > connection to it, through Tableau. > > Because otherwise, that data engine will need to have an ODBC driver, which > is clearly a pain in the neck if you Google enough. That's actually what > I'm trying to do. I need to implement a Calcite adapter to support a data > engine but supporting Tableau is essential to our customers and I'd be very > happy if I can avoid going through the Calcite ODBC driver path. > > I apologize if this sounds like a Calcite question but I believe Drill > developers who worked on the JDBC driver can give a good insight. > > If you ask me, I believe Drill is all about Calcite in distributed mode :D, > this may very well be so sketchy point of view but I'm not experienced with > Drill or Calcite myself. > > Hopefully I explained my self clearly. > > Thanks, > Gelbana >
Re: Propose about join push down
Hi Boaz: Sorry for the wrong example. "select t2.a,t2.s,t3.d (select a, sum(b) as s from t1 where c='1' group by a ) t2 join t3 on t2.a = t3.a" this sql would make sense. The prerequisite for join push down is the storage plugin supports filter push down. The corresponding rule should learn about this message to decide whether to do the join push down (storage plugin like elastic search will benefit from this). I think there's little change to current hashjoin process logic except the data pushing down work. 1st the build side table constructs the bloom filter. 2. The hashjoin batch pushes down the bloom filter. 3 The things left behaves the same as current implementation to do the join work between filtered probing data and the build side ones. One thing explicitly is to implement next call with data parameters . I will think about this. On Wed, 20 Sep 2017 at 5:25 AM Boaz Ben-Zviwrote: > Hi Weijie, > > Are there some typos in the sample query ? Looks like the projection > should be t2.a,t2.s,t3.d (i.e., t2 instead of t1). Also the predicate “ > where a='1' ” makes the inner query return only a single row, which is > pretty trivial. > > Assuming these changes are made, then there could be many t2 “a” > values to be equi-joined to t3’s “a” values. > > With Bloom filters, the rows from t3 would only be “mostly filtered”; > there still needs to be a join above to produce the final result. > > If wanting to push the “whole join” down, then _either_ need to have some > index mechanism on “t3.a” – which would work as a nested loop join (NLJ), > _or_ need to perform another type of join down below (with all related > issues, like memory control, spill etc). For the NLJ, indeed the current > Drill does not support “down flow” of data (and most storage does not have > indexes), and it’ll take some work to implement (e.g., all operators would > need to accept a next() call with some “data” parameter). > > Boaz > > > On 9/19/17, 8:45 AM, "weijie tong" wrote: > > All: >This is a propose about join query tuning by pushing down the join > condition. Welcome suggestion ,discussion,objection . > >Suppose we have a join query "select t1.a,t1.s,t3.d (select a, > sum(b) as > s from t1 where a='1' group by a ) t2 join t3 on t2.a = t3.a" . This > query will be transferred to a hashjoin or boradcast hashjoin (if > metadata > is accurate ). But the t3's rows will all be pulled out from the > storage. > If the t3 is a large table,the performance will be unacceptable. > If we can first get the 'a' result set of the inner query,then we > pushed > down the result set to the right table t3's scan node. The right > table's > scan will be quickly. > > possible solutions : > 1. A new physical operator or broadcast join ,hash join > enhancements > , which need to first query the left table's data, then push down the > filtered left join condition column set to the right table stream, once > confirmed the pushed down , works as normal join query logic. > 2. The pushed down join condition set maybe two possible formats > bloom > filters bytes or list of strings. > 3. RecordBatch needs to support to push down 2's data down stream. > 4. SubScan needs to hold the 2's data,and wait for next real call > to > push down to the storage level query. > 5. Storage level should have an interface to indicate whether it > supports to solve the pushed down bloom filter or list of strings. > > Since this violates drill's data flow direction,it seems a lot of > work > to do ,to change to implement this feature. > > >
Re: Propose about join push down
Hi Boaz: Sorry for the wrong example,it should be "select t2.a,t2.s,t3.d (select a, sum(b) as s from t1 where c='1' group by a ) t2 join t3 on t2.a = t3.a" which would make sense. The prerequisite for pushing down join is the storage plugin support filter push down. The storage plugin should add a interface to indicate it supports join push down. The corresponding rule will care about this. I think this strategy also applies to hashjoin. The build side table's join keys construct the bloom filter firstly. Then it pushs down the bloom filter down (next call with data parameters).All other things left are the same process logic as the current hashjoin implementation. On Wed, 20 Sep 2017 at 5:25 AM Boaz Ben-Zviwrote: > Hi Weijie, > > Are there some typos in the sample query ? Looks like the projection > should be t2.a,t2.s,t3.d (i.e., t2 instead of t1). Also the predicate “ > where a='1' ” makes the inner query return only a single row, which is > pretty trivial. > > Assuming these changes are made, then there could be many t2 “a” > values to be equi-joined to t3’s “a” values. > > With Bloom filters, the rows from t3 would only be “mostly filtered”; > there still needs to be a join above to produce the final result. > > If wanting to push the “whole join” down, then _either_ need to have some > index mechanism on “t3.a” – which would work as a nested loop join (NLJ), > _or_ need to perform another type of join down below (with all related > issues, like memory control, spill etc). For the NLJ, indeed the current > Drill does not support “down flow” of data (and most storage does not have > indexes), and it’ll take some work to implement (e.g., all operators would > need to accept a next() call with some “data” parameter). > > Boaz > > > On 9/19/17, 8:45 AM, "weijie tong" wrote: > > All: >This is a propose about join query tuning by pushing down the join > condition. Welcome suggestion ,discussion,objection . > >Suppose we have a join query "select t1.a,t1.s,t3.d (select a, > sum(b) as > s from t1 where a='1' group by a ) t2 join t3 on t2.a = t3.a" . This > query will be transferred to a hashjoin or boradcast hashjoin (if > metadata > is accurate ). But the t3's rows will all be pulled out from the > storage. > If the t3 is a large table,the performance will be unacceptable. > If we can first get the 'a' result set of the inner query,then we > pushed > down the result set to the right table t3's scan node. The right > table's > scan will be quickly. > > possible solutions : > 1. A new physical operator or broadcast join ,hash join > enhancements > , which need to first query the left table's data, then push down the > filtered left join condition column set to the right table stream, once > confirmed the pushed down , works as normal join query logic. > 2. The pushed down join condition set maybe two possible formats > bloom > filters bytes or list of strings. > 3. RecordBatch needs to support to push down 2's data down stream. > 4. SubScan needs to hold the 2's data,and wait for next real call > to > push down to the storage level query. > 5. Storage level should have an interface to indicate whether it > supports to solve the pushed down bloom filter or list of strings. > > Since this violates drill's data flow direction,it seems a lot of > work > to do ,to change to implement this feature. > > >
Re: Drill 2.0 (design) hackathon
Thanks All, it is really helpful. On Wed, Sep 20, 2017 at 8:13 AM Charles Givrewrote: > Thank you Aman for organizing and to MapR for hosting! > > On Wed, Sep 20, 2017 at 11:12 AM, Aman Sinha wrote: > > > Thanks to all the folks who attended the hackathon - both local and > remote. > > For the remote attendees, you missed out on a good dinner :) > > > > We had a day of excellent discussion on several topics: Resource > > management, operator level performance improvements, TPC-DS coverage, > > metadata management, concurrency, usability and error handling, storage > > plugins + rest APIs. It will take a couple of days to compile all the > > notes and we will post them. > > > > Since the focus was more in-depth discussion rather than breadth, and 1 > day > > is clearly not adequate, some topics were left out. We can continue > those > > discussions on the dev list / hangout or if it can wait, possibly do it > in > > a future hackathon. > > > > -Aman > > > > On Fri, Sep 15, 2017 at 2:54 PM, Charles Givre wrote: > > > > > Hi Pritesh, > > > What time do you think you’d want me to present? Also, should I make > > some > > > slides? > > > Best, > > > — C > > > > > > > On Sep 15, 2017, at 13:23, Pritesh Maker wrote: > > > > > > > > Hi All > > > > > > > > We are looking forward to hosting the hackathon on Monday. Just a few > > > updates on the logistics and agenda > > > > > > > > • We are expecting over 25 people attending the event – you can see > the > > > attendee list at the Eventbrite site - https://www.eventbrite.com/e/ > > > drill-developer-day-sept-2017-registration-7478463285 > > > > > > > > • Breakfast will be served starting at 8:30AM – we would like to > begin > > > promptly at 9AM > > > > > > > > • The agenda has been updated to reflect the speakers (see the update > > in > > > the sheet - https://docs.google.com/spreadsheets/d/ > > > 1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 ) > > > > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman Sinha > > > > o Community Contributions – Anil Kumar, John Omernik, Charles Givre > and > > > Ted Dunning > > > > o Two tracks for technical design discussions – some topics have > > initial > > > thoughts for the topics and some will have open brainstorming > discussions > > > > o Once the discussions are concluded, we will have summaries > presented > > > and notes shared with the community > > > > > > > > • We will have a WebEx for the first two sessions. For the two > tracks, > > > we will either continue the WebEx or have Hangout links (will publish > > them > > > to the google sheet) > > > > "JOIN WEBEX MEETING > > > > > https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c6 > > c76 > > > > Meeting number (access code): 806 111 950 > > > > Meeting password: ApacheDrill" > > > > > > > > • For the attendees in person, we have made bookings for a dinner in > > the > > > evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas > > > > > > > > Looking forward to a fantastic day for the Apache Drill! community! > > > > > > > > Thanks, > > > > Pritesh > > > > > > > > > > > > > > > > On 9/5/17, 10:47 PM, "Aman Sinha" wrote: > > > > > > > >Here is the Eventbrite event for registration: > > > > > > > >https://www.eventbrite.com/e/drill-developer-day-sept-2017- > > > registration-7478463285 > > > > > > > >Please register so we can plan for food and drinks appropriately. > > > > > > > >The link also contains a google doc link for the preliminary > agenda > > > and a > > > >'Topics' tab with volunteer sign-up column. Please add your name > to > > > the > > > >area(s) of interest. > > > > > > > >Thanks and look forward to seeing you all ! > > > > > > > >-Aman > > > > > > > >On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers > > > wrote: > > > > > > > >> A partial list of Drill’s public APIs: > > > >> > > > >> IMHO, highest priority for Drill 2.0. > > > >> > > > >> > > > >> * JDBC/ODBC drivers > > > >> * Client (for JDBC/ODBC) + ODBC & JDBC > > > >> * Client (for full Drill async, columnar) > > > >> * Storage plugin > > > >> * Format plugin > > > >> * System/session options > > > >> * Queueing (e.g. ZK-based queues) > > > >> * Rest API > > > >> * Resource Planning (e.g. max query memory per node) > > > >> * Metadata access, storage (e.g. file system locations vs. a > > > metastore) > > > >> * Metadata files formats (Parquet, views, etc.) > > > >> > > > >> Lower priority for future releases: > > > >> > > > >> > > > >> * Query Planning (e.g. Calcite rules) > > > >> * Config options > > > >> * SQL syntax, especially Drill extensions > > > >> * UDF > > > >> * Management (e.g. JMX, Rest API calls, etc.) > > > >> * Drill File System (HDFS) > > > >> * Web UI > > > >> * Shell scripts > > > >> > > > >> There are
Re: Drill 2.0 (design) hackathon
Thank you Aman for organizing and to MapR for hosting! On Wed, Sep 20, 2017 at 11:12 AM, Aman Sinhawrote: > Thanks to all the folks who attended the hackathon - both local and remote. > For the remote attendees, you missed out on a good dinner :) > > We had a day of excellent discussion on several topics: Resource > management, operator level performance improvements, TPC-DS coverage, > metadata management, concurrency, usability and error handling, storage > plugins + rest APIs. It will take a couple of days to compile all the > notes and we will post them. > > Since the focus was more in-depth discussion rather than breadth, and 1 day > is clearly not adequate, some topics were left out. We can continue those > discussions on the dev list / hangout or if it can wait, possibly do it in > a future hackathon. > > -Aman > > On Fri, Sep 15, 2017 at 2:54 PM, Charles Givre wrote: > > > Hi Pritesh, > > What time do you think you’d want me to present? Also, should I make > some > > slides? > > Best, > > — C > > > > > On Sep 15, 2017, at 13:23, Pritesh Maker wrote: > > > > > > Hi All > > > > > > We are looking forward to hosting the hackathon on Monday. Just a few > > updates on the logistics and agenda > > > > > > • We are expecting over 25 people attending the event – you can see the > > attendee list at the Eventbrite site - https://www.eventbrite.com/e/ > > drill-developer-day-sept-2017-registration-7478463285 > > > > > > • Breakfast will be served starting at 8:30AM – we would like to begin > > promptly at 9AM > > > > > > • The agenda has been updated to reflect the speakers (see the update > in > > the sheet - https://docs.google.com/spreadsheets/d/ > > 1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 ) > > > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman Sinha > > > o Community Contributions – Anil Kumar, John Omernik, Charles Givre and > > Ted Dunning > > > o Two tracks for technical design discussions – some topics have > initial > > thoughts for the topics and some will have open brainstorming discussions > > > o Once the discussions are concluded, we will have summaries presented > > and notes shared with the community > > > > > > • We will have a WebEx for the first two sessions. For the two tracks, > > we will either continue the WebEx or have Hangout links (will publish > them > > to the google sheet) > > > "JOIN WEBEX MEETING > > > https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c6 > c76 > > > Meeting number (access code): 806 111 950 > > > Meeting password: ApacheDrill" > > > > > > • For the attendees in person, we have made bookings for a dinner in > the > > evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas > > > > > > Looking forward to a fantastic day for the Apache Drill! community! > > > > > > Thanks, > > > Pritesh > > > > > > > > > > > > On 9/5/17, 10:47 PM, "Aman Sinha" wrote: > > > > > >Here is the Eventbrite event for registration: > > > > > >https://www.eventbrite.com/e/drill-developer-day-sept-2017- > > registration-7478463285 > > > > > >Please register so we can plan for food and drinks appropriately. > > > > > >The link also contains a google doc link for the preliminary agenda > > and a > > >'Topics' tab with volunteer sign-up column. Please add your name to > > the > > >area(s) of interest. > > > > > >Thanks and look forward to seeing you all ! > > > > > >-Aman > > > > > >On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers > > wrote: > > > > > >> A partial list of Drill’s public APIs: > > >> > > >> IMHO, highest priority for Drill 2.0. > > >> > > >> > > >> * JDBC/ODBC drivers > > >> * Client (for JDBC/ODBC) + ODBC & JDBC > > >> * Client (for full Drill async, columnar) > > >> * Storage plugin > > >> * Format plugin > > >> * System/session options > > >> * Queueing (e.g. ZK-based queues) > > >> * Rest API > > >> * Resource Planning (e.g. max query memory per node) > > >> * Metadata access, storage (e.g. file system locations vs. a > > metastore) > > >> * Metadata files formats (Parquet, views, etc.) > > >> > > >> Lower priority for future releases: > > >> > > >> > > >> * Query Planning (e.g. Calcite rules) > > >> * Config options > > >> * SQL syntax, especially Drill extensions > > >> * UDF > > >> * Management (e.g. JMX, Rest API calls, etc.) > > >> * Drill File System (HDFS) > > >> * Web UI > > >> * Shell scripts > > >> > > >> There are certainly more. Please suggest those that are missing. I’ve > > >> taken a rough cut at which APIs need forward/backward compatibility > > first, > > >> in part based on those that are the “most public” and most likely to > > >> change. Others are important, but we can’t do them all at once. > > >> > > >> Thanks, > > >> > > >> - Paul > > >> > > >> On Aug 29, 2017, at 6:00 PM, Aman Sinha
Re: Drill 2.0 (design) hackathon
Thanks to all the folks who attended the hackathon - both local and remote. For the remote attendees, you missed out on a good dinner :) We had a day of excellent discussion on several topics: Resource management, operator level performance improvements, TPC-DS coverage, metadata management, concurrency, usability and error handling, storage plugins + rest APIs. It will take a couple of days to compile all the notes and we will post them. Since the focus was more in-depth discussion rather than breadth, and 1 day is clearly not adequate, some topics were left out. We can continue those discussions on the dev list / hangout or if it can wait, possibly do it in a future hackathon. -Aman On Fri, Sep 15, 2017 at 2:54 PM, Charles Givrewrote: > Hi Pritesh, > What time do you think you’d want me to present? Also, should I make some > slides? > Best, > — C > > > On Sep 15, 2017, at 13:23, Pritesh Maker wrote: > > > > Hi All > > > > We are looking forward to hosting the hackathon on Monday. Just a few > updates on the logistics and agenda > > > > • We are expecting over 25 people attending the event – you can see the > attendee list at the Eventbrite site - https://www.eventbrite.com/e/ > drill-developer-day-sept-2017-registration-7478463285 > > > > • Breakfast will be served starting at 8:30AM – we would like to begin > promptly at 9AM > > > > • The agenda has been updated to reflect the speakers (see the update in > the sheet - https://docs.google.com/spreadsheets/d/ > 1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 ) > > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman Sinha > > o Community Contributions – Anil Kumar, John Omernik, Charles Givre and > Ted Dunning > > o Two tracks for technical design discussions – some topics have initial > thoughts for the topics and some will have open brainstorming discussions > > o Once the discussions are concluded, we will have summaries presented > and notes shared with the community > > > > • We will have a WebEx for the first two sessions. For the two tracks, > we will either continue the WebEx or have Hangout links (will publish them > to the google sheet) > > "JOIN WEBEX MEETING > > https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c6c76 > > Meeting number (access code): 806 111 950 > > Meeting password: ApacheDrill" > > > > • For the attendees in person, we have made bookings for a dinner in the > evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas > > > > Looking forward to a fantastic day for the Apache Drill! community! > > > > Thanks, > > Pritesh > > > > > > > > On 9/5/17, 10:47 PM, "Aman Sinha" wrote: > > > >Here is the Eventbrite event for registration: > > > >https://www.eventbrite.com/e/drill-developer-day-sept-2017- > registration-7478463285 > > > >Please register so we can plan for food and drinks appropriately. > > > >The link also contains a google doc link for the preliminary agenda > and a > >'Topics' tab with volunteer sign-up column. Please add your name to > the > >area(s) of interest. > > > >Thanks and look forward to seeing you all ! > > > >-Aman > > > >On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers > wrote: > > > >> A partial list of Drill’s public APIs: > >> > >> IMHO, highest priority for Drill 2.0. > >> > >> > >> * JDBC/ODBC drivers > >> * Client (for JDBC/ODBC) + ODBC & JDBC > >> * Client (for full Drill async, columnar) > >> * Storage plugin > >> * Format plugin > >> * System/session options > >> * Queueing (e.g. ZK-based queues) > >> * Rest API > >> * Resource Planning (e.g. max query memory per node) > >> * Metadata access, storage (e.g. file system locations vs. a > metastore) > >> * Metadata files formats (Parquet, views, etc.) > >> > >> Lower priority for future releases: > >> > >> > >> * Query Planning (e.g. Calcite rules) > >> * Config options > >> * SQL syntax, especially Drill extensions > >> * UDF > >> * Management (e.g. JMX, Rest API calls, etc.) > >> * Drill File System (HDFS) > >> * Web UI > >> * Shell scripts > >> > >> There are certainly more. Please suggest those that are missing. I’ve > >> taken a rough cut at which APIs need forward/backward compatibility > first, > >> in part based on those that are the “most public” and most likely to > >> change. Others are important, but we can’t do them all at once. > >> > >> Thanks, > >> > >> - Paul > >> > >> On Aug 29, 2017, at 6:00 PM, Aman Sinha > mansi...@apache.org>> wrote: > >> > >> Hi Paul, > >> certainly makes sense to have the API compatibility discussions during > this > >> hackathon. The 2.0 release may be a good checkpoint to introduce > breaking > >> changes necessitating changes to the ODBC/JDBC drivers and other > external > >> applications. As part of this exercise (not during the hackathon but as > a > >>
[jira] [Created] (DRILL-5807) ambiguous error
XiaHang created DRILL-5807: -- Summary: ambiguous error Key: DRILL-5807 URL: https://issues.apache.org/jira/browse/DRILL-5807 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Affects Versions: 1.11.0 Environment: Linux Reporter: XiaHang Priority: Critical if the final plan like below , JdbcFilter is below a JdbcJoin and above another JdbcJoin . JdbcProject(order_id=[$0], mord_id=[$6], item_id=[$2], div_pay_amt=[$5], item_quantity=[$4], slr_id=[$11]): rowcount = 5625.0, cumulative cost = {12540.0 rows, 29763.0 cpu, 0.0 io}, id = 327 JdbcJoin(condition=[=($3, $11)], joinType=[left]): rowcount = 5625.0, cumulative cost = {8040.0 rows, 2763.0 cpu, 0.0 io}, id = 325 JdbcFilter(condition=[OR(AND(OR(IS NOT NULL($7), >($5, 0)), =($1, 2), OR(AND(=($10, '箱包皮具/热销女包/男包'), >(/($5, $4), 1000)), AND(OR(=($10, '家装主材'), =($10, '大家电')), >(/($5, $4), 1000)), AND(OR(=($10, '珠宝/钻石/翡翠/黄金'), =($10, '饰品/流行首饰/时尚饰品新')), >(/($5, $4), 2000)), AND(>(/($5, $4), 500), <>($10, '箱包皮具/热销女包/男包'), <>($10, '家装主材'), <>($10, '大家电'), <>($10, '珠宝/钻石/翡翠/黄金'), <>($10, '饰品/流行首饰/时尚饰品新'))), <>($10, '成人用品/情趣用品'), <>($10, '鲜花速递/花卉仿真/绿植园艺'), <>($10, '水产肉类/新鲜蔬果/熟食')), AND(<=(-(EXTRACT(FLAG(EPOCH), CURRENT_TIMESTAMP), EXTRACT(FLAG(EPOCH), CAST($8):TIMESTAMP(0))), *(*(*(14, 24), 60), 60)), OR(AND(OR(=($10, '箱包皮具/热销女包/男包'), =($10, '家装主材'), =($10, '大家电'), =($10, '珠宝/钻石/翡翠/黄金'), =($10, '饰品/流行首饰/时尚饰品新')), >(/($5, $4), 2000)), AND(OR(=($10, '男装'), =($10, '女装/女士精品'), =($10, '办公设备/耗材/相关服务')), >(/($5, $4), 1000)), AND(OR(=($10, '流行男鞋'), =($10, '女鞋')), >(/($5, $4), 1500))), IS NOT NULL($8)), AND(>=(-(EXTRACT(FLAG(EPOCH), CURRENT_TIMESTAMP), EXTRACT(FLAG(EPOCH), CAST($8):TIMESTAMP(0))), *(*(*(15, 24), 60), 60)), <=(-(EXTRACT(FLAG(EPOCH), CURRENT_TIMESTAMP), EXTRACT(FLAG(EPOCH), CAST($8):TIMESTAMP(0))), *(*(*(60, 24), 60), 60)), OR(AND(OR(=($10, '箱包皮具/热销女包/男包'), =($10, '珠宝/钻石/翡翠/黄金'), =($10, '饰品/流行首饰/时尚饰品新')), >(/($5, $4), 5000)), AND(OR(=($10, '男装'), =($10, '女装/女士精品')), >(/($5, $4), 3000)), AND(OR(=($10, '流行男鞋'), =($10, '女鞋')), >(/($5, $4), 2500)), AND(=($10, '办公设备/耗材/相关服务'), >(/($5, $4), 2000))), IS NOT NULL($8)))]): rowcount = 375.0, cumulative cost = {2235.0 rows, 2582.0 cpu, 0.0 io}, id = 320 JdbcJoin(condition=[=($2, $9)], joinType=[left]): rowcount = 1500.0, cumulative cost = {1860.0 rows, 1082.0 cpu, 0.0 io}, id = 318 JdbcProject(order_id=[$0], pay_status=[$2], item_id=[$3], seller_id=[$5], item_quantity=[$7], div_pay_amt=[$20], mord_id=[$1], pay_time=[$19], succ_time=[$52]): rowcount = 100.0, cumulative cost = {180.0 rows, 821.0 cpu, 0.0 io}, id = 313 JdbcTableScan(table=[[public, dws_tb_crm_u2_ord_base_df]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 29 JdbcProject(item_id=[$0], cate_level1_name=[$47]): rowcount = 100.0, cumulative cost = {180.0 rows, 261.0 cpu, 0.0 io}, id = 316 JdbcTableScan(table=[[public, dws_tb_crm_u2_itm_base_df]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 46 JdbcProject(slr_id=[$3]): rowcount = 100.0, cumulative cost = {180.0 rows, 181.0 cpu, 0.0 io}, id = 323 JdbcTableScan(table=[[public, dws_tb_crm_u2_slr_base]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 68 the sql is converted to SELECT "t1"."order_id", "t1"."mord_id", "t1"."item_id", "t1"."div_pay_amt", "t1"."item_quantity", "t2"."slr_id" FROM (SELECT * FROM (SELECT "order_id", "pay_status", "item_id", "seller_id", "item_quantity", "div_pay_amt", "mord_id", "pay_time", "succ_time" FROM "dws_tb_crm_u2_ord_base_df") AS "t" LEFT JOIN (SELECT "item_id", "cate_level1_name" FROM "dws_tb_crm_u2_itm_base_df") AS "t0" ON "t"."item_id" = "t0"."item_id" WHERE ("t"."pay_time" IS NOT NULL OR "t"."div_pay_amt" > 0) AND "t"."pay_status" = 2 AND ("t0"."cate_level1_name" = '箱包皮具/热销女包/男包' AND "t"."div_pay_amt" / "t"."item_quantity" > 1000 OR ("t0"."cate_level1_name" = '家装主材' OR "t0"."cate_level1_name" = '大家电') AND "t"."div_pay_amt" / "t"."item_quantity" > 1000 OR ("t0"."cate_level1_name" = '珠宝/钻石/翡翠/黄金' OR "t0"."cate_level1_name" = '饰品/流行首饰/时尚饰品新') AND "t"."div_pay_amt" / "t"."item_quantity" > 2000 OR "t"."div_pay_amt" / "t"."item_quantity" > 500 AND "t0"."cate_level1_name" <> '箱包皮具/热销女包/男包' AND "t0"."cate_level1_name" <> '家装主材' AND "t0"."cate_level1_name" <> '大家电' AND "t0"."cate_level1_name" <> '珠宝/钻石/翡翠/黄金' AND "t0"."cate_level1_name" <> '饰品/流行首饰/时尚饰品新') AND "t0"."cate_level1_name" <> '成人用品/情趣用品' AND "t0"."cate_level1_name" <> '鲜花速递/花卉仿真/绿植园艺' AND "t0"."cate_level1_name" <> '水产肉类/新鲜蔬果/熟食' OR EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) - EXTRACT(EPOCH FROM CAST("t"."succ_time" AS TIMESTAMP(0))) <= 14 * 24 * 60 * 60 AND (("t0"."cate_level1_name" = '箱包皮具/热销女包/男包' OR "t0"."cate_level1_name" = '家装主材' OR