Re: Drill Hangout 2/7/2017

2017-02-06 Thread Jinfeng Ni
Regarding item 2 & 3,  IIKC, there is no upper limit of IN clause. An
IN-list predicate  could be converted to a JOIN between T1 and VALUES
operator, when the number of values in IN-LIST is beyond certain
threshold. An Aggregate operator is applied to remove possible
duplicates values in the list. That's why you may see HashAgg in the
query plan.

If the number of values in IN-list is under the threshold, IN-list is
evaluated as a OR-ed predicates.

The default threshold is 20 [1]. But you can change it, by running following:

alter session set `planner.in_subquery_threshold` = some_number;


>From T1
WHERE  T1.expression in (value1, value2, ..., value_n)?

==>
   Join
  /\
   T1 Agg
  \
  Values (values1, values2, ..., values_n)


[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java#L99-L100

On Mon, Feb 6, 2017 at 9:16 PM,   wrote:
> Hi,
>
> I am interested in joining this session.
>
> My area of interest would be -
>
> 1. Future roadmap of Apache Drill
> 2. How apache drill creates HashAgg when there are lot of IN members in a 
> where clause
> 3. What is the upper limit of IN CLAUSE.
>
> Regards,
> Jasbir Singh
>
> -Original Message-
> From: Jinfeng Ni [mailto:j...@apache.org]
> Sent: Tuesday, February 07, 2017 1:19 AM
> To: dev ; user 
> Subject: Drill Hangout 2/7/2017
>
> Hi drillers,
>
> We are going to have Drill Hangout tomorrow (02/07/2017, 10 AM PT). If you 
> have any suggestions for hangout topics, you can add them to this thread. We 
> will also ask around at the beginning of the hangout for topics.
>
> Thank you,
>
> Jinfeng
>
> 
>
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise confidential information. If you have received it 
> in error, please notify the sender immediately and delete the original. Any 
> other use of the e-mail by you is prohibited. Where allowed by local law, 
> electronic communications with Accenture and its affiliates, including e-mail 
> and instant messaging (including content), may be scanned by our systems for 
> the purposes of information security and assessment of internal compliance 
> with Accenture policy.
> __
>
> www.accenture.com


[GitHub] drill pull request #:

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on the pull request:


https://github.com/apache/drill/commit/c16570705a182bf833576a7ddb546665442ef14d#commitcomment-20776570
  
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java:
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 on line 395:
The problem is... those validators only work for session and system 
options; not for the config options used here.

Yes, it would be good to have a global validation mechanism. But, that is 
what TypeSafe provides for config parameters. (Though it does not handle 
min/max.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...

2017-02-06 Thread laurentgo
Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/520#discussion_r99749138
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java
 ---
@@ -231,11 +274,16 @@ public boolean storesMixedCaseQuotedIdentifiers() 
throws SQLException {
 return super.storesMixedCaseQuotedIdentifiers();
   }
 
-  // TODO(DRILL-3510):  Update when Drill accepts standard SQL's double 
quote.
   @Override
   public String getIdentifierQuoteString() throws SQLException {
 throwIfClosed();
-return "`";
+Property property = 
getServerProperty(PlannerSettings.QUOTING_IDENTIFIERS_CHARACTER_KEY);
+for (Quoting value : Quoting.values()) {
+  if (value.string.equals(property.getValue())) {
+return value.string;
+  }
+}
+throw new SQLException("Unknown quoting identifier character " + 
property.getValue());
--- End diff --

we probably need a fallback (see my pr #613 )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...

2017-02-06 Thread laurentgo
Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/520#discussion_r99747062
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java
 ---
@@ -204,10 +204,11 @@ public void validate(final OptionValue v, final 
OptionManager manager) {
* Validator that checks if the given value is included in a list of 
acceptable values. Case insensitive.
*/
   public static class EnumeratedStringValidator extends StringValidator {
-private final Set valuesSet = new HashSet<>();
+private final Set valuesSet = new LinkedHashSet<>();
--- End diff --

is ordering important?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...

2017-02-06 Thread laurentgo
Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/520#discussion_r99747009
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -695,6 +698,33 @@ public void runQuery(QueryType type, 
List planFragments, UserResul
   }
 
   /**
+   * Get server properties that represent the list of server session 
options.
+   *
+   * @return server properties for the server session options.
+   */
+  public ServerProperties getOptions() throws RpcException {
--- End diff --

I don't think this is the right interface to expose to the user as this is 
too generic and introduce too much indirect coupling (clients starting 
dependending on specific options). To be clear, I'm the one to blame here. When 
we discussed adding metadata methods to the JDBC/ODBC client, one of the things 
discussed was a server info metadata, to return things like quoting or some 
other properties (equivalent to JDBC DatabaseMetadata object, or the C++ 
connector Metadata class: 
https://github.com/apache/drill/blob/master/contrib/native/client/src/include/drill/drillClient.hpp#L712),
 but I didn't free some time to add the missing RPC call.

This is probably what we should use here, and hopefully I can probably add 
it before end of week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...

2017-02-06 Thread laurentgo
Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/520#discussion_r99749402
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserSession.java ---
@@ -59,9 +62,11 @@
   public static final String USER = "user";
   public static final String PASSWORD = "password";
   public static final String IMPERSONATION_TARGET = "impersonation_target";
+  public static final String QUOTING_IDENTIFIERS_CHARACTER = 
"quoting_identifiers_character";
--- End diff --

isn't this a bit too long? could "quoting" be used instead? (also, brackets 
are two characters... :) )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


RE: Drill Hangout 2/7/2017

2017-02-06 Thread jasbir.sing
Hi,

I am interested in joining this session.

My area of interest would be -

1. Future roadmap of Apache Drill
2. How apache drill creates HashAgg when there are lot of IN members in a where 
clause
3. What is the upper limit of IN CLAUSE.

Regards,
Jasbir Singh

-Original Message-
From: Jinfeng Ni [mailto:j...@apache.org]
Sent: Tuesday, February 07, 2017 1:19 AM
To: dev ; user 
Subject: Drill Hangout 2/7/2017

Hi drillers,

We are going to have Drill Hangout tomorrow (02/07/2017, 10 AM PT). If you have 
any suggestions for hangout topics, you can add them to this thread. We will 
also ask around at the beginning of the hangout for topics.

Thank you,

Jinfeng



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


[GitHub] drill issue #743: DRILL-5243: Fix TestContextFunctions.sessionIdUDFWithinSam...

2017-02-06 Thread sudheeshkatkam
Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/743
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #727: DRILL-5219: Relax user properties validation in C++ client

2017-02-06 Thread sudheeshkatkam
Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/727
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #738: DRILL-5190: Display planning time for a query in it...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/738#discussion_r99713451
  
--- Diff: exec/java-exec/src/main/resources/rest/profile/profile.ftl ---
@@ -106,6 +106,7 @@
   STATE: ${model.getProfile().getState().name()}
   FOREMAN: ${model.getProfile().getForeman().getAddress()}
   TOTAL FRAGMENTS: ${model.getProfile().getTotalFragments()}
+  PLANNING: ${model.getPlanningDuration()}
--- End diff --

Is planning part of duration? Or, is total time planning time + (execution) 
duration?

If planning time is part of the (overall) duration, may be show that 
visually somehow. Indented? And, if we show planning time, should we also not 
show queue time and execution time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #738: DRILL-5190: Display planning time for a query in it...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/738#discussion_r99713077
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -417,6 +417,9 @@ private void parseAndRunPhysicalPlan(final String json) 
throws ExecutionSetupExc
 
   private void runPhysicalPlan(final PhysicalPlan plan) throws 
ExecutionSetupException {
 validatePlan(plan);
+//Marking endTime of Planning
+queryManager.markPlanningEndTime();
--- End diff --

Nit. The next line is also part of planning, though should execute quite 
quickly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #738: DRILL-5190: Display planning time for a query in it...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/738#discussion_r99712636
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java
 ---
@@ -77,12 +77,14 @@
*/
   public static String getPrettyDuration(long startTimeMillis, long 
endTimeMillis) {
 long durationInMillis = (startTimeMillis > endTimeMillis ? 
System.currentTimeMillis() : endTimeMillis) - startTimeMillis;
-long hours = TimeUnit.MILLISECONDS.toHours(durationInMillis);
+long days = TimeUnit.MILLISECONDS.toDays(durationInMillis);
--- End diff --

The cost of initializing an object is trivial. Unless we are in critical 
performance code, code clarity and simplicity is more important than a few 
object creations. Indeed, in the code below, each time we do a
```
stringA + stringB
```
We create a temporary String object. So, we are already creating a large 
collection of objects.

The question is really 1) code repetition (not duplicating the code to 
split out fields) and 2) ease-of-use (making it easy to use the duration format 
utility.)

Thanks for combining this code into a utility. Once committed, I will 
shamelessly reuse the code to replace the hacks I've used for displaying 
durations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #738: DRILL-5190: Display planning time for a query in it...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/738#discussion_r99712967
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java
 ---
@@ -122,6 +125,31 @@ public String getQueryId() {
 return id;
   }
 
+  public String getPlanningDuration() {
+//Check if Plan End is valid
+if (profile.getStart() > profile.getPlanEnd()) {
--- End diff --

Code clarity is always a goal. One handy technique is to pick off 
conditions quickly. Here, we could do:
```
if (profile.getStart() <= profile.getPlanEnd()) {
  return ProfileResources.getPrettyDuration(profile.getStart(), 
profile.getPlanEnd());
}
// Don't estimate...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #739: DRILL-5230: Translation of millisecond duration int...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/739#discussion_r99711436
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java
 ---
@@ -352,16 +369,26 @@ public String cancelQuery(@PathParam("queryid") 
String queryId) {
   private void checkOrThrowProfileViewAuthorization(final QueryProfile 
profile) {
 if (!principal.canManageProfileOf(profile.getUser())) {
   throw UserException.permissionError()
-  .message("Not authorized to view the profile of query '%s'", 
profile.getId())
-  .build(logger);
+  .message("Not authorized to view the profile of query '%s'", 
profile.getId())
+  .build(logger);
 }
   }
 
   private void checkOrThrowQueryCancelAuthorization(final String 
queryUser, final String queryId) {
 if (!principal.canManageQueryOf(queryUser)) {
   throw UserException.permissionError()
-  .message("Not authorized to cancel the query '%s'", queryId)
-  .build(logger);
+  .message("Not authorized to cancel the query '%s'", queryId)
+  .build(logger);
 }
   }
+
+  /**
+   * Duration format definition
+   * @author kkhatua
--- End diff --

Drill practice seems to be to omit the author. Many IDE's helpfully add 
this, but you can turn off that option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #739: DRILL-5230: Translation of millisecond duration int...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/739#discussion_r99688930
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java
 ---
@@ -73,18 +73,35 @@
* e.g. getPrettyDuration(1468368841695,1468394096016) = '7 hr 00 min 
54.321 sec'
* @param startTimeMillis Start Time in milliseconds
* @param endTimeMillis   End Time in milliseconds
+   * @param format  Display format
* @returnHuman-Readable Elapsed Time
*/
-  public static String getPrettyDuration(long startTimeMillis, long 
endTimeMillis) {
+  public static String getPrettyDuration(long startTimeMillis, long 
endTimeMillis, DurationFormat format) {
--- End diff --

It is often cleaner to just have two methods, rather than one method with a 
"command". Since we need to split out the data into a bunch of fields, this can 
be done by another method that creates a structure. Then, since you've created 
the structure, it might as well be the class that does the format, and offer 
two format methods: compact and verbose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #717: DRILL-5080: Memory-managed version of external sort

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/717#discussion_r99250103
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -177,13 +177,47 @@ drill.exec: {
   sort: {
 purge.threshold : 1000,
 external: {
-  batch.size : 4000,
+  // Drill uses the managed External Sort Batch by default.
+  // Set this to true to use the legacy, unmanaged version.
+  // Disabled in the intial commit, to be enabled after
+  // tests are committed.
+  disable_managed: true
+  // Limit on the number of batches buffered in memory.
+  // Primarily for testing.
+  // 0 = unlimited
+  batch_limit: 0
+  // Limit on the amount of memory used for xsort. Overrides the
+  // value provided by Foreman. Primarily for testing.
+  // 0 = unlimited, Supports HOCON memory suffixes.
+  mem_limit: 0
+  // Limit on the number of spilled batches that can be merged in
+  // a single pass. Limits the number of open file handles.
+  // 0 = unlimited
+  merge_limit: 0
   spill: {
-batch.size : 4000,
-group.size : 4,
-threshold : 4,
-directories : [ "/tmp/drill/spill" ],
-fs : "file:///"
+// Deprecated for managed xsort; used only by legacy xsort
+group.size: 4,
+// Deprecated for managed xsort; used only by legacy xsort
+threshold: 4,
+// Minimum number of in-memory batches to spill per spill file
+// Affects only spilling from memory to disk.
+// Primarily for testing.
+min_batches: 2,
+// Maximum number of in-memory batches to spill per spill file
+// Affects only spilling from memory to disk.
+// Primarily for testing.
+// 0 = unlimited
+max_batches: 0,
+// File system to use. Local file system by default.
+fs: "file:///"
+// List of directories to use. Directories are created
--- End diff --

Here that is implied by the JSON-like syntax.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #717: DRILL-5080: Memory-managed version of external sort

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/717#discussion_r99246595
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/MSortTemplate.java
 ---
@@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+import java.util.Queue;
+
+import javax.inject.Named;
+
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.memory.BaseAllocator;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.hadoop.util.IndexedSortable;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Queues;
+
+import io.netty.buffer.DrillBuf;
+
+public abstract class MSortTemplate implements MSorter, IndexedSortable {
+//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MSortTemplate.class);
+
+  private SelectionVector4 vector4;
+  private SelectionVector4 aux;
+  @SuppressWarnings("unused")
+  private long compares;
+
+  /**
+   * Holds offsets into the SV4 of the start of each batch
+   * (sorted run.)
+   */
+
+  private Queue runStarts = Queues.newLinkedBlockingQueue();
+  private FragmentContext context;
+
+  /**
+   * Controls the maximum size of batches exposed to downstream
+   */
+  private int desiredRecordBatchCount;
+
+  @Override
+  public void setup(final FragmentContext context, final BufferAllocator 
allocator, final SelectionVector4 vector4,
+final VectorContainer hyperBatch, int outputBatchSize) 
throws SchemaChangeException{
+// we pass in the local hyperBatch since that is where we'll be 
reading data.
+Preconditions.checkNotNull(vector4);
+this.vector4 = vector4.createNewWrapperCurrent();
+this.context = context;
+vector4.clear();
+doSetup(context, hyperBatch, null);
+
+// Populate the queue with the offset in the SV4 of each
+// batch. Note that this is expensive as it requires a scan
+// of all items to be sorted: potentially millions.
+
+runStarts.add(0);
+int batch = 0;
+final int totalCount = this.vector4.getTotalCount();
+for (int i = 0; i < totalCount; i++) {
+  final int newBatch = this.vector4.get(i) >>> 16;
+  if (newBatch == batch) {
+continue;
+  } else if (newBatch == batch + 1) {
+runStarts.add(i);
+batch = newBatch;
+  } else {
+throw new UnsupportedOperationException(String.format("Missing 
batch. batch: %d newBatch: %d", batch, newBatch));
+  }
+}
+
+// Create a temporary SV4 to hold the merged results.
+
+@SuppressWarnings("resource")
+final DrillBuf drillBuf = allocator.buffer(4 * totalCount);
+desiredRecordBatchCount = Math.min(outputBatchSize, 
Character.MAX_VALUE);
+desiredRecordBatchCount = Math.min(desiredRecordBatchCount, 
totalCount);
+aux = new SelectionVector4(drillBuf, totalCount, 
desiredRecordBatchCount);
+  }
+
+  /**
+   * For given recordCount how much memory does MSorter needs for its own 
purpose. This is used in
+   * ExternalSortBatch to make decisions about whether to spill or not.
+   *
+   * @param recordCount
+   * @return
+   */
+  public static long memoryNeeded(final int recordCount) {
+// We need 4 bytes (SV4) for each record.
+// The memory allocator will round this to the next
+// power of 2.
+
+return BaseAllocator.nextPowerOfTwo(recordCount * 4);
+  }
+
+  /**
+   * Given two regions within 

[GitHub] drill pull request #717: DRILL-5080: Memory-managed version of external sort

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/717#discussion_r99246295
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/InMemorySorter.java
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+import java.util.LinkedList;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
+import org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder;
+import 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.SortResults;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+
+public class InMemorySorter implements SortResults {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(InMemorySorter.class);
+
+  private SortRecordBatchBuilder builder;
+  private MSorter mSorter;
+  private final FragmentContext context;
+  private final BufferAllocator oAllocator;
+  private SelectionVector4 sv4;
+  private final OperatorCodeGenerator opCg;
+  private int batchCount;
+
+  public InMemorySorter(FragmentContext context, BufferAllocator 
allocator, OperatorCodeGenerator opCg) {
+this.context = context;
+this.oAllocator = allocator;
+this.opCg = opCg;
+  }
+
+  public SelectionVector4 sort(LinkedList 
batchGroups, VectorAccessible batch,
+VectorContainer destContainer) {
+if (builder != null) {
+  builder.clear();
+  builder.close();
+}
+builder = new SortRecordBatchBuilder(oAllocator);
+
+for (BatchGroup.InputBatch group : batchGroups) {
+  RecordBatchData rbd = new RecordBatchData(group.getContainer(), 
oAllocator);
+  rbd.setSv2(group.getSv2());
+  builder.add(rbd);
+}
+batchGroups.clear();
+
+try {
+  builder.build(context, destContainer);
+  sv4 = builder.getSv4();
+  mSorter = opCg.createNewMSorter(batch);
+  mSorter.setup(context, oAllocator, sv4, destContainer, 
sv4.getCount());
+} catch (SchemaChangeException e) {
+  throw UserException.unsupportedError(e)
+.message("Unexpected schema change - likely code error.")
+.build(logger);
+}
+
+// For testing memory-leaks, inject exception after mSorter finishes 
setup
+
ExternalSortBatch.injector.injectUnchecked(context.getExecutionControls(), 
ExternalSortBatch.INTERRUPTION_AFTER_SETUP);
+mSorter.sort(destContainer);
+
+// sort may have prematurely exited due to should continue returning 
false.
+if (!context.shouldContinue()) {
+  return null;
+}
+
+// For testing memory-leak purpose, inject exception after mSorter 
finishes sorting
+
ExternalSortBatch.injector.injectUnchecked(context.getExecutionControls(), 
ExternalSortBatch.INTERRUPTION_AFTER_SORT);
+sv4 = mSorter.getSV4();
+
+destContainer.buildSchema(SelectionVectorMode.FOUR_BYTE);
+return sv4;
+  }
+
+  @Override
+  public boolean next() {
+boolean more = sv4.next();
+if (more) { batchCount++; }
+return more;
+  }
+
+  @Override
+  public void close() {
+if (builder != null) {
+  builder.clear();
+  builder.close();
+}
+if (mSorter != null) {
+  mSorter.clear();
+}
+  }
+
+  @Override
+  public int getBatchCount() {

[GitHub] drill pull request #717: DRILL-5080: Memory-managed version of external sort

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/717#discussion_r99247477
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/OperatorCodeGenerator.java
 ---
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.calcite.rel.RelFieldCollation.Direction;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.ErrorCollector;
+import org.apache.drill.common.expression.ErrorCollectorImpl;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.logical.data.Order.Ordering;
+import org.apache.drill.exec.compile.sig.GeneratorMapping;
+import org.apache.drill.exec.compile.sig.MappingSet;
+import org.apache.drill.exec.exception.ClassTransformationException;
+import org.apache.drill.exec.expr.ClassGenerator;
+import org.apache.drill.exec.expr.ClassGenerator.HoldingContainer;
+import org.apache.drill.exec.expr.CodeGenerator;
+import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
+import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.config.ExternalSort;
+import org.apache.drill.exec.physical.config.Sort;
+import org.apache.drill.exec.physical.impl.xsort.SingleBatchSorter;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.vector.CopyUtil;
+
+import com.sun.codemodel.JConditional;
+import com.sun.codemodel.JExpr;
+
+/**
+ * Generates and manages the data-specific classes for this operator.
+ * 
+ * Several of the code generation methods take a batch, but the methods
+ * are called for many batches, and generate code only for the first one.
+ * Better would be to generate code from a schema; but Drill is not set
+ * up for that at present.
+ */
+
+public class OperatorCodeGenerator {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(OperatorCodeGenerator.class);
+
+  protected static final MappingSet MAIN_MAPPING = new MappingSet((String) 
null, null, ClassGenerator.DEFAULT_SCALAR_MAP, 
ClassGenerator.DEFAULT_SCALAR_MAP);
+  protected static final MappingSet LEFT_MAPPING = new 
MappingSet("leftIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, 
ClassGenerator.DEFAULT_SCALAR_MAP);
+  protected static final MappingSet RIGHT_MAPPING = new 
MappingSet("rightIndex", null, ClassGenerator.DEFAULT_SCALAR_MAP, 
ClassGenerator.DEFAULT_SCALAR_MAP);
+
+  private static final GeneratorMapping COPIER_MAPPING = new 
GeneratorMapping("doSetup", "doCopy", null, null);
+  private static final MappingSet COPIER_MAPPING_SET = new 
MappingSet(COPIER_MAPPING, COPIER_MAPPING);
+
+  private final FragmentContext context;
+  @SuppressWarnings("unused")
+  private BatchSchema schema;
+
+  /**
+   * A single PriorityQueueCopier instance is used for 2 purposes:
+   * 1. Merge sorted batches before spilling
+   * 2. Merge sorted batches when all incoming data fits in memory
+   */
+
+  private PriorityQueueCopier copier;
+  private final Sort popConfig;
+  private MSorter mSorter;
+
+  /**
+   * Generated sort operation used to sort each incoming batch according to
+   * the sort criteria specified in the {@link ExternalSort} definition of
+   * this operator.
+   */
+
+  private SingleBatchSorter sorter;
+
+  public OperatorCodeGenerator(FragmentContext context, Sort popConfig) {
+this.context = context;
+this.popConfig = popConfig;
+  }
+
+  public void setSchema(BatchSchema schema) {
+close();
+this.schema = schema;
+  }
+
+  public 

[GitHub] drill pull request #717: DRILL-5080: Memory-managed version of external sort

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/717#discussion_r99245869
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/InMemorySorter.java
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+import java.util.LinkedList;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
+import org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder;
+import 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.SortResults;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+
+public class InMemorySorter implements SortResults {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(InMemorySorter.class);
+
+  private SortRecordBatchBuilder builder;
+  private MSorter mSorter;
--- End diff --

Good catch! Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #717: DRILL-5080: Memory-managed version of external sort

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/717#discussion_r99247699
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/PriorityQueueCopier.java
 ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.drill.exec.compile.TemplateClassDefinition;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.record.VectorAccessible;
+
+public interface PriorityQueueCopier extends AutoCloseable {
+  public static final long INITIAL_ALLOCATION = 1000;
+  public static final long MAX_ALLOCATION = 2000;
--- End diff --

Original code. Actually, these are no longer used, so removed them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #717: DRILL-5080: Memory-managed version of external sort

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/717#discussion_r99250032
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SpillSet.java
 ---
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
--- End diff --

Good idea. Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #739: DRILL-5230: Translation of millisecond duration into hours...

2017-02-06 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/739
  
@paul-rogers , @sudheeshkatkam 
Committed changes based on your recommendations. I also noticed that some 
of the calls were passing fragment IDs as links, which were never applied. 
Based on the file history, my hunch is that it was meant for debugging and 
forgotten ever since. Corrected it by replacing these with the appropriate 
`null` values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #:

2017-02-06 Thread Ben-Zvi
Github user Ben-Zvi commented on the pull request:


https://github.com/apache/drill/commit/c16570705a182bf833576a7ddb546665442ef14d#commitcomment-20772328
  
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java:
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 on line 405:
Ditto for the MIN_MERGED_BATCH_SIZE ; capping at %10 mem limit 
still needs to be here ..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #:

2017-02-06 Thread Ben-Zvi
Github user Ben-Zvi commented on the pull request:


https://github.com/apache/drill/commit/c16570705a182bf833576a7ddb546665442ef14d#commitcomment-20772419
  
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java:
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 on line 405:
Maybe need to cap at 16M anyway ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #:

2017-02-06 Thread Ben-Zvi
Github user Ben-Zvi commented on the pull request:


https://github.com/apache/drill/commit/c16570705a182bf833576a7ddb546665442ef14d#commitcomment-20771562
  
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java:
In 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 on line 395:
(not a comment) A wish -- Wish our options setting mechanism be more 
sophisticated; like define the option along with various restrictions (like min 
and max values, etc). Then all those checks would be done in a single module, 
instead of all over the code - like here defining an inline constant and adding 
code to compare the option. Also here if a user sets a value violating the 
limit he/she would not understand why the new value did not apply. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #742: DRILL-5242: The UI breaks when rendering profiles h...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/742#discussion_r99684283
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -163,11 +165,18 @@ public String getMetricsTable() {
   null);
 
   final Number[] values = new Number[metricNames.length];
+  //Track new/Unknown Metrics
+  final Set unknownMetrics = new TreeSet();
   for (final MetricValue metric : op.getMetricList()) {
-if (metric.hasLongValue()) {
-  values[metric.getMetricId()] = metric.getLongValue();
-} else if (metric.hasDoubleValue()) {
-  values[metric.getMetricId()] = metric.getDoubleValue();
+if (metric.getMetricId() < metricNames.length) {
+  if (metric.hasLongValue()) {
+values[metric.getMetricId()] = metric.getLongValue();
+  } else if (metric.hasDoubleValue()) {
+values[metric.getMetricId()] = metric.getDoubleValue();
+  }
+} else {
+  //Tracking unknown metric IDs
+  unknownMetrics.add(metric.getMetricId());
--- End diff --

Will this work? We leave the metric unset, then iterate over them in the 
following loop. Also, we build the unknownMetrics set, but never use it.

Suggestion: if the metric is not known, just set its value to 0, and log a 
message to the log file.

This situation occurs only when 1) opening an old profile with metrics that 
no longer exist, or 2) when adding a metric but not registering it properly.

Note that this code does not handle the case where the metric is not 
registered (the metric names is null.)

For the external sort, the problem occurred because we have two sets of 
metric enums (two implementations) but only one registry of names. The fix was 
to keep the old and new ones in sync.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #578: DRILL-4280: Kerberos Authentication

2017-02-06 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/578#discussion_r99678863
  
--- Diff: contrib/native/client/src/clientlib/saslAuthenticatorImpl.cpp ---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+#include "saslAuthenticatorImpl.hpp"
+
+#include "drillClientImpl.hpp"
+#include "logger.hpp"
+
+namespace Drill {
+
+#define DEFAULT_SERVICE_NAME "drill"
+
+#define KERBEROS_SIMPLE_NAME "kerberos"
+#define KERBEROS_SASL_NAME "gssapi"
+#define PLAIN_NAME "plain"
+
+const std::map 
SaslAuthenticatorImpl::MECHANISM_MAPPING = boost::assign::map_list_of
+(KERBEROS_SIMPLE_NAME, KERBEROS_SASL_NAME)
+(PLAIN_NAME, PLAIN_NAME)
+;
+
+boost::mutex SaslAuthenticatorImpl::s_mutex;
+bool SaslAuthenticatorImpl::s_initialized = false;
+
+SaslAuthenticatorImpl::SaslAuthenticatorImpl(const DrillUserProperties* 
const properties) :
+m_properties(properties), m_pConnection(NULL), m_secret(NULL) {
+
+if (!s_initialized) {
+boost::lock_guard 
lock(SaslAuthenticatorImpl::s_mutex);
+if (!s_initialized) {
+// set plugin path if provided
+if (DrillClientConfig::getSaslPluginPath()) {
+char *saslPluginPath = const_cast(DrillClientConfig::getSaslPluginPath());
+sasl_set_path(0, saslPluginPath);
+}
+
+sasl_client_init(NULL);
+{ // for debugging purposes
+const char **mechanisms = sasl_global_listmech();
+int i = 0;
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "SASL mechanisms 
available on client: " << std::endl;)
+while (mechanisms[i] != NULL) {
+DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << i << " : " << 
mechanisms[i] << std::endl;)
+i++;
+}
+}
+s_initialized = true;
+}
+}
+}
+
+SaslAuthenticatorImpl::~SaslAuthenticatorImpl() {
+if (m_secret) {
+free(m_secret);
+}
+// may be used to negotiated security layers before disposing in the 
future
+if (m_pConnection) {
+sasl_dispose(_pConnection);
+}
+m_pConnection = NULL;
+}
+
+typedef int (*sasl_callback_proc_t)(void); // see sasl_callback_ft
+
+int SaslAuthenticatorImpl::userNameCallback(void *context, int id, const 
char **result, unsigned *len) {
+const std::string* const username = static_cast(context);
+
+if ((SASL_CB_USER == id || SASL_CB_AUTHNAME == id)
+&& username != NULL) {
+*result = username->c_str();
+// *len = (unsigned int) username->length();
+}
+return SASL_OK;
+}
+
+int SaslAuthenticatorImpl::passwordCallback(sasl_conn_t *conn, void 
*context, int id, sasl_secret_t **psecret) {
+const SaslAuthenticatorImpl* const authenticator = static_cast(context);
+
+if (SASL_CB_PASS == id) {
+const std::string password = authenticator->m_password;
+const size_t length = password.length();
+authenticator->m_secret->len = length;
+std::memcpy(authenticator->m_secret->data, password.c_str(), 
length);
+*psecret = authenticator->m_secret;
+}
+return SASL_OK;
+}
+
+int SaslAuthenticatorImpl::init(const std::vector& 
mechanisms, exec::shared::SaslMessage& response) {
+// find and set parameters
+std::string authMechanismToUse;
+std::string serviceName;
+std::string serviceHost;
+for (size_t i = 0; i < m_properties->size(); i++) {
+const std::string key = m_properties->keyAt(i);
+const std::string value = m_properties->valueAt(i);
+
+if (USERPROP_SERVICE_HOST == key) {
+serviceHost = value;

Drill Hangout 2/7/2017

2017-02-06 Thread Jinfeng Ni
Hi drillers,

We are going to have Drill Hangout tomorrow (02/07/2017, 10 AM PT). If
you have any suggestions for hangout topics, you can add them to this
thread. We will also ask around at the beginning of the hangout for
topics.

Thank you,

Jinfeng


[GitHub] drill pull request #701: DRILL-4963: Fix issues with dynamically loaded over...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/701#discussion_r99649541
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java
 ---
@@ -260,76 +293,101 @@ public RemoteFunctionRegistry 
getRemoteFunctionRegistry() {
   }
 
   /**
-   * Attempts to load and register functions from remote function registry.
-   * First checks if there is no missing jars.
-   * If yes, enters synchronized block to prevent other loading the same 
jars.
-   * Again re-checks if there are no missing jars in case someone has 
already loaded them (double-check lock).
-   * If there are still missing jars, first copies jars to local udf area 
and prepares {@link JarScan} for each jar.
-   * Jar registration timestamp represented in milliseconds is used as 
suffix.
-   * Then registers all jars at the same time. Returns true when finished.
-   * In case if any errors during jars coping or registration, logs errors 
and proceeds.
+   * Purpose of this method is to synchronize remote and local function 
registries if needed
+   * and to inform if function registry was changed after given version.
*
-   * If no missing jars are found, checks current local registry version.
-   * Returns false if versions match, true otherwise.
+   * To make synchronization as much light-weigh as possible, first only 
versions of both registries are checked
+   * without any locking. If synchronization is needed, enters 
synchronized block to prevent others loading the same jars.
+   * The need of synchronization is checked again (double-check lock) 
before comparing jars.
+   * If any missing jars are found, they are downloaded to local udf area, 
each is wrapped into {@link JarScan}.
+   * Once jar download is finished, all missing jars are registered in one 
batch.
+   * In case if any errors during jars download / registration, these 
errors are logged.
*
-   * @param version local function registry version
-   * @return true if new jars were registered or local function registry 
version is different, false otherwise
+   * During registration local function registry is updated with remote 
function registry version it is synced with.
+   * When at least one jar of the missing jars failed to download / 
register,
+   * local function registry version are not updated but jars that where 
successfully downloaded / registered
+   * are added to local function registry.
+   *
+   * If synchronization between remote and local function registry was not 
needed,
+   * checks if given registry version matches latest sync version
+   * to inform if function registry was changed after given version.
+   *
+   * @param version remote function registry local function registry was 
based on
+   * @return true if remote and local function registries were 
synchronized after given version
*/
-  public boolean loadRemoteFunctions(long version) {
-List missingJars = getMissingJars(remoteFunctionRegistry, 
localFunctionRegistry);
-if (!missingJars.isEmpty()) {
+  public boolean syncWithRemoteRegistry(long version) {
+if 
(doSyncFunctionRegistries(remoteFunctionRegistry.getRegistryVersion(), 
localFunctionRegistry.getVersion())) {
   synchronized (this) {
-missingJars = getMissingJars(remoteFunctionRegistry, 
localFunctionRegistry);
-if (!missingJars.isEmpty()) {
-  logger.info("Starting dynamic UDFs lazy-init process.\n" +
-  "The following jars are going to be downloaded and 
registered locally: " + missingJars);
+long localRegistryVersion = localFunctionRegistry.getVersion();
+if 
(doSyncFunctionRegistries(remoteFunctionRegistry.getRegistryVersion(), 
localRegistryVersion))  {
+  DataChangeVersion remoteVersion = new DataChangeVersion();
+  List missingJars = 
getMissingJars(this.remoteFunctionRegistry, localFunctionRegistry, 
remoteVersion);
   List jars = Lists.newArrayList();
-  for (String jarName : missingJars) {
-Path binary = null;
-Path source = null;
-URLClassLoader classLoader = null;
-try {
-  binary = copyJarToLocal(jarName, remoteFunctionRegistry);
-  source = copyJarToLocal(JarUtil.getSourceName(jarName), 
remoteFunctionRegistry);
-  URL[] urls = {binary.toUri().toURL(), 
source.toUri().toURL()};
-  classLoader = new URLClassLoader(urls);
-  ScanResult scanResult = scan(classLoader, binary, urls);
-  localFunctionRegistry.validate(jarName, scanResult);
-  jars.add(new JarScan(jarName, scanResult, classLoader));
-} catch 

[GitHub] drill issue #701: DRILL-4963: Fix issues with dynamically loaded overloaded ...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/701
  
It seems that the concept of overloading, is, itself, ambiguous. If I 
define a function `foo(long)`, but call it with an `int`, we won't get an exact 
match, will we? So, on every call we'd have to check if there is a new, better, 
match for `foo()` in the registry. This means a call to ZK for every function 
in every query where we don't have an exact parameter match. My suspicion is 
that this will be a performance issue, but we won't know until someone tests it.

I wonder if we should do this fix incrementally. This PR is better than the 
original, as it does handle overloads. After that, we can do a bit of 
performance testing to see the impact of checking ZK for the version on every 
overloaded method. Any performance improvement can be done as separate JIRA and 
PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #701: DRILL-4963: Fix issues with dynamically loaded over...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/701#discussion_r99454180
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java
 ---
@@ -178,22 +192,41 @@ private String functionReplacement(FunctionCall 
functionCall) {
   }
 
   /**
-   * Find the Drill function implementation that matches the name, arg 
types and return type.
-   * If exact function implementation was not found,
-   * loads all missing remote functions and tries to find Drill 
implementation one more time.
+   * Finds the Drill function implementation that matches the name, arg 
types and return type.
+   *
+   * @param name function name
+   * @param argTypes input parameters types
+   * @param returnType function return type
+   * @return exactly matching function holder
--- End diff --

Thanks for adding the Javadoc! Very helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #701: DRILL-4963: Fix issues with dynamically loaded over...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/701#discussion_r99650129
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
 ---
@@ -50,13 +47,56 @@
   private DrillSqlWorker() {
   }
 
+  /**
+   * Converts sql query string into query physical plan.
+   *
+   * @param context query context
+   * @param sql sql query
+   * @return query physical plan
+   */
   public static PhysicalPlan getPlan(QueryContext context, String sql) 
throws SqlParseException, ValidationException,
   ForemanSetupException {
 return getPlan(context, sql, null);
   }
 
+  /**
+   * Converts sql query string into query physical plan.
+   * In case of any errors (that might occur due to missing function 
implementation),
+   * checks if local function registry should be synchronized with remote 
function registry.
+   * If sync took place, reloads drill operator table
+   * (since functions were added to / removed from local function registry)
+   * and attempts to converts sql query string into query physical plan 
one more time.
+   *
+   * @param context query context
+   * @param sql sql query
+   * @param textPlan text plan
+   * @return query physical plan
+   */
--- End diff --

Thanks for adding the Javadoc!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #701: DRILL-4963: Fix issues with dynamically loaded over...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/701#discussion_r99650496
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
 ---
@@ -50,13 +47,56 @@
   private DrillSqlWorker() {
   }
 
+  /**
+   * Converts sql query string into query physical plan.
+   *
+   * @param context query context
+   * @param sql sql query
+   * @return query physical plan
+   */
   public static PhysicalPlan getPlan(QueryContext context, String sql) 
throws SqlParseException, ValidationException,
   ForemanSetupException {
 return getPlan(context, sql, null);
   }
 
+  /**
+   * Converts sql query string into query physical plan.
+   * In case of any errors (that might occur due to missing function 
implementation),
+   * checks if local function registry should be synchronized with remote 
function registry.
+   * If sync took place, reloads drill operator table
+   * (since functions were added to / removed from local function registry)
+   * and attempts to converts sql query string into query physical plan 
one more time.
+   *
+   * @param context query context
+   * @param sql sql query
+   * @param textPlan text plan
+   * @return query physical plan
+   */
   public static PhysicalPlan getPlan(QueryContext context, String sql, 
Pointer textPlan)
   throws ForemanSetupException {
+Pointer textPlanCopy = textPlan == null ? null : new 
Pointer<>(textPlan.value);
+try {
+  return getQueryPlan(context, sql, textPlan);
+} catch (Exception e) {
--- End diff --

Should we be more specific in the error we catch? Wouldn't this mean that, 
even for a simple syntax error, we'd resync and retry? Can we catch only the 
specific function error of interest?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #701: DRILL-4963: Fix issues with dynamically loaded over...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/701#discussion_r99649785
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java
 ---
@@ -260,76 +293,101 @@ public RemoteFunctionRegistry 
getRemoteFunctionRegistry() {
   }
 
   /**
-   * Attempts to load and register functions from remote function registry.
-   * First checks if there is no missing jars.
-   * If yes, enters synchronized block to prevent other loading the same 
jars.
-   * Again re-checks if there are no missing jars in case someone has 
already loaded them (double-check lock).
-   * If there are still missing jars, first copies jars to local udf area 
and prepares {@link JarScan} for each jar.
-   * Jar registration timestamp represented in milliseconds is used as 
suffix.
-   * Then registers all jars at the same time. Returns true when finished.
-   * In case if any errors during jars coping or registration, logs errors 
and proceeds.
+   * Purpose of this method is to synchronize remote and local function 
registries if needed
+   * and to inform if function registry was changed after given version.
*
-   * If no missing jars are found, checks current local registry version.
-   * Returns false if versions match, true otherwise.
+   * To make synchronization as much light-weigh as possible, first only 
versions of both registries are checked
+   * without any locking. If synchronization is needed, enters 
synchronized block to prevent others loading the same jars.
+   * The need of synchronization is checked again (double-check lock) 
before comparing jars.
+   * If any missing jars are found, they are downloaded to local udf area, 
each is wrapped into {@link JarScan}.
+   * Once jar download is finished, all missing jars are registered in one 
batch.
+   * In case if any errors during jars download / registration, these 
errors are logged.
*
-   * @param version local function registry version
-   * @return true if new jars were registered or local function registry 
version is different, false otherwise
+   * During registration local function registry is updated with remote 
function registry version it is synced with.
+   * When at least one jar of the missing jars failed to download / 
register,
+   * local function registry version are not updated but jars that where 
successfully downloaded / registered
+   * are added to local function registry.
+   *
+   * If synchronization between remote and local function registry was not 
needed,
+   * checks if given registry version matches latest sync version
+   * to inform if function registry was changed after given version.
+   *
+   * @param version remote function registry local function registry was 
based on
+   * @return true if remote and local function registries were 
synchronized after given version
*/
-  public boolean loadRemoteFunctions(long version) {
-List missingJars = getMissingJars(remoteFunctionRegistry, 
localFunctionRegistry);
-if (!missingJars.isEmpty()) {
+  public boolean syncWithRemoteRegistry(long version) {
+if 
(doSyncFunctionRegistries(remoteFunctionRegistry.getRegistryVersion(), 
localFunctionRegistry.getVersion())) {
   synchronized (this) {
-missingJars = getMissingJars(remoteFunctionRegistry, 
localFunctionRegistry);
-if (!missingJars.isEmpty()) {
-  logger.info("Starting dynamic UDFs lazy-init process.\n" +
-  "The following jars are going to be downloaded and 
registered locally: " + missingJars);
+long localRegistryVersion = localFunctionRegistry.getVersion();
+if 
(doSyncFunctionRegistries(remoteFunctionRegistry.getRegistryVersion(), 
localRegistryVersion))  {
+  DataChangeVersion remoteVersion = new DataChangeVersion();
+  List missingJars = 
getMissingJars(this.remoteFunctionRegistry, localFunctionRegistry, 
remoteVersion);
   List jars = Lists.newArrayList();
-  for (String jarName : missingJars) {
-Path binary = null;
-Path source = null;
-URLClassLoader classLoader = null;
-try {
-  binary = copyJarToLocal(jarName, remoteFunctionRegistry);
-  source = copyJarToLocal(JarUtil.getSourceName(jarName), 
remoteFunctionRegistry);
-  URL[] urls = {binary.toUri().toURL(), 
source.toUri().toURL()};
-  classLoader = new URLClassLoader(urls);
-  ScanResult scanResult = scan(classLoader, binary, urls);
-  localFunctionRegistry.validate(jarName, scanResult);
-  jars.add(new JarScan(jarName, scanResult, classLoader));
-} catch 

[GitHub] drill pull request #701: DRILL-4963: Fix issues with dynamically loaded over...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/701#discussion_r99454556
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java
 ---
@@ -260,76 +293,101 @@ public RemoteFunctionRegistry 
getRemoteFunctionRegistry() {
   }
 
   /**
-   * Attempts to load and register functions from remote function registry.
-   * First checks if there is no missing jars.
-   * If yes, enters synchronized block to prevent other loading the same 
jars.
-   * Again re-checks if there are no missing jars in case someone has 
already loaded them (double-check lock).
-   * If there are still missing jars, first copies jars to local udf area 
and prepares {@link JarScan} for each jar.
-   * Jar registration timestamp represented in milliseconds is used as 
suffix.
-   * Then registers all jars at the same time. Returns true when finished.
-   * In case if any errors during jars coping or registration, logs errors 
and proceeds.
+   * Purpose of this method is to synchronize remote and local function 
registries if needed
+   * and to inform if function registry was changed after given version.
*
-   * If no missing jars are found, checks current local registry version.
-   * Returns false if versions match, true otherwise.
+   * To make synchronization as much light-weigh as possible, first only 
versions of both registries are checked
+   * without any locking. If synchronization is needed, enters 
synchronized block to prevent others loading the same jars.
+   * The need of synchronization is checked again (double-check lock) 
before comparing jars.
+   * If any missing jars are found, they are downloaded to local udf area, 
each is wrapped into {@link JarScan}.
+   * Once jar download is finished, all missing jars are registered in one 
batch.
+   * In case if any errors during jars download / registration, these 
errors are logged.
*
-   * @param version local function registry version
-   * @return true if new jars were registered or local function registry 
version is different, false otherwise
+   * During registration local function registry is updated with remote 
function registry version it is synced with.
+   * When at least one jar of the missing jars failed to download / 
register,
+   * local function registry version are not updated but jars that where 
successfully downloaded / registered
+   * are added to local function registry.
+   *
+   * If synchronization between remote and local function registry was not 
needed,
+   * checks if given registry version matches latest sync version
+   * to inform if function registry was changed after given version.
+   *
+   * @param version remote function registry local function registry was 
based on
+   * @return true if remote and local function registries were 
synchronized after given version
*/
-  public boolean loadRemoteFunctions(long version) {
-List missingJars = getMissingJars(remoteFunctionRegistry, 
localFunctionRegistry);
-if (!missingJars.isEmpty()) {
+  public boolean syncWithRemoteRegistry(long version) {
+if 
(doSyncFunctionRegistries(remoteFunctionRegistry.getRegistryVersion(), 
localFunctionRegistry.getVersion())) {
   synchronized (this) {
-missingJars = getMissingJars(remoteFunctionRegistry, 
localFunctionRegistry);
-if (!missingJars.isEmpty()) {
-  logger.info("Starting dynamic UDFs lazy-init process.\n" +
-  "The following jars are going to be downloaded and 
registered locally: " + missingJars);
+long localRegistryVersion = localFunctionRegistry.getVersion();
+if 
(doSyncFunctionRegistries(remoteFunctionRegistry.getRegistryVersion(), 
localRegistryVersion))  {
+  DataChangeVersion remoteVersion = new DataChangeVersion();
+  List missingJars = 
getMissingJars(this.remoteFunctionRegistry, localFunctionRegistry, 
remoteVersion);
   List jars = Lists.newArrayList();
-  for (String jarName : missingJars) {
-Path binary = null;
-Path source = null;
-URLClassLoader classLoader = null;
-try {
-  binary = copyJarToLocal(jarName, remoteFunctionRegistry);
-  source = copyJarToLocal(JarUtil.getSourceName(jarName), 
remoteFunctionRegistry);
-  URL[] urls = {binary.toUri().toURL(), 
source.toUri().toURL()};
-  classLoader = new URLClassLoader(urls);
-  ScanResult scanResult = scan(classLoader, binary, urls);
-  localFunctionRegistry.validate(jarName, scanResult);
-  jars.add(new JarScan(jarName, scanResult, classLoader));
-} catch 

[GitHub] drill pull request #701: DRILL-4963: Fix issues with dynamically loaded over...

2017-02-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/701#discussion_r99454038
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java
 ---
@@ -140,27 +142,39 @@ public void register(DrillOperatorTable 
operatorTable) {
   }
 
   /**
-   * Using the given functionResolver
-   * finds Drill function implementation for given 
functionCall.
-   * If function implementation was not found,
-   * loads all missing remote functions and tries to find Drill 
implementation one more time.
+   * First attempts to finds the Drill function implementation that 
matches the name, arg types and return type.
+   * If exact function implementation was not found,
+   * syncs local function registry with remote function registry if needed
+   * and tries to find function implementation one more time
--- End diff --

While this sounds pretty good, consider a possible condition. Suppose a 
user consistently uses an overloaded method. Every one of those queries will 
need to check with ZK. Drill is supposed to handle many concurrent queries. 
Each of those will trigger the update. Soon, we'll be pounding on ZK hundreds 
of times per second.

The "not found" case was fine to force a sync since a user would not, 
presumably, continually issue such queries if the function really were 
undefined. But, the overloaded function case is possible, and can lead to 
performance issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DRAFT REPORT] Apache Drill

2017-02-06 Thread Parth Chandra
Filing the following, updated with stats as of this morning.

 Report Begins 

## Description:
 - Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud
Storage

## Issues:
 - There are no issues requiring board attention at this time

## Activity:
 - Since the last board report, Drill has released version 1.9
 - Drill has added many new features since the last report. More Parquet
reader performance improvements, temp tables support, an improved work
assignment algorithm,  and an httpd format plugin.
 - Work continues on improved use of statistics, and security enhancements
(including support for Kerberos) and a sort with managed memory usage.

## Health report:
 - The project is healthy. Development activity is high and is reflected in
an increase in the number of mails to the mailing list, many new pull
requests and increased activity in JIRA. Two new committers were added in
the last period.

## PMC changes:

 - Currently 18 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Sudheesh Katkam on Wed Oct 05 2016

## Committer base changes:

 - Currently 30 committers.
 - New commmitters:
- Chris Westin was added as a committer on Wed Nov 30 2016
- Neeraja Rentachintala was added as a committer on Wed Nov 16 2016

## Releases:

 - 1.9.0 was released on Mon Nov 28 2016

## Mailing list activity:

 - Mailing list activity is healthy.

 - dev@drill.apache.org:
- 436 subscribers (up 2 in the last 3 months):
- 1919 emails sent to list (1599 in previous quarter)

 - iss...@drill.apache.org:
- 20 subscribers (up 0 in the last 3 months):
- 2618 emails sent to list (2003 in previous quarter)

 - u...@drill.apache.org:
- 577 subscribers (up 12 in the last 3 months):
- 372 emails sent to list (430 in previous quarter)


## JIRA activity:

 - 236 JIRA tickets created in the last 3 months
 - 85 JIRA tickets closed/resolved in the last 3 months

 Report Ends 


[GitHub] drill issue #613: DRILL-4730: Update JDBC DatabaseMetaData implementation to...

2017-02-06 Thread laurentgo
Github user laurentgo commented on the issue:

https://github.com/apache/drill/pull/613
  
@sudheeshkatkam can you please give it another review?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DRAFT REPORT] Apache Drill

2017-02-06 Thread Parth Chandra
I think we completed the dynamic UDFs feature before the previous report.
It was mentioned in the report filed in November.

On Fri, Feb 3, 2017 at 1:59 PM, Sudheesh Katkam  wrote:

> LGTM
>
> One more feature: ability to add UDFs dynamically
>
> > On Feb 3, 2017, at 10:47 AM, Parth Chandra  wrote:
> >
> > It's time to file the Apache Drill quarterly report with the Board. Below
> > is the draft of the report.
> >
> > Is there anything folks would like to add?
> >
> > Thanks
> >
> > Parth
> >
> > --- Report Begins 
> > ## Description:
> > - Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud
> > Storage
> >
> > ## Issues:
> > - There are no issues requiring board attention at this time
> >
> > ## Activity:
> > - Since the last board report, Drill has released version 1.9
> > - Drill has added many new features since the last report. More Parquet
> > reader performance improvements, temp tables support, an improved work
> > assignment algorithm,  and an httpd format plugin.
> > - Work continues on improved use of statistics, and security enhancements
> > (including support for Kerberos) and a sort with managed memory usage.
> >
> > ## Health report:
> > - The project is healthy. Development activity is high and is reflected
> in
> > an increase in the number of mails to the mailing list, many new pull
> > requests and increased activity in JIRA. Two new committers were added in
> > the last period.
> >
> > ## PMC changes:
> >
> > - Currently 18 PMC members.
> > - No new PMC members added in the last 3 months
> > - Last PMC addition was Sudheesh Katkam on Wed Oct 05 2016
> >
> > ## Committer base changes:
> >
> > - Currently 30 committers.
> > - New commmitters:
> >- Chris Westin was added as a committer on Wed Nov 30 2016
> >- Neeraja Rentachintala was added as a committer on Wed Nov 16 2016
> >
> > ## Releases:
> >
> > - 1.9.0 was released on Mon Nov 28 2016
> >
> > ## Mailing list activity:
> >
> > - Mailing list activity is healthy.
> >
> > - dev@drill.apache.org:
> >- 436 subscribers (up 2 in the last 3 months):
> >- 1825 emails sent to list (1652 in previous quarter)
> >
> > - iss...@drill.apache.org:
> >- 20 subscribers (up 0 in the last 3 months):
> >- 2524 emails sent to list (2068 in previous quarter)
> >
> > - u...@drill.apache.org:
> >- 578 subscribers (up 13 in the last 3 months):
> >- 372 emails sent to list (461 in previous quarter)
> >
> >
> > ## JIRA activity:
> >
> > - 238 JIRA tickets created in the last 3 months
> > - 87 JIRA tickets closed/resolved in the last 3 months
> >
> >
> > --- Report Ends 
>
>


1.10 release

2017-02-06 Thread Parth Chandra
Hi Drill devs,

  Any thoughts on when we should do a 1.10 release? I can see a bunch of
work (stats, managed sort, security) is in progress that is possibly fairly
close to completion. Can the folks working on these items suggest a time
frame that is comfortable for everyone?

Parth


Column ordering is incorrect when ORDER BY is used with LIMIT clause in query over parquet data

2017-02-06 Thread Khurram Faraaz
All,


This looks incorrect.


Query with order by + limit clause, the ordering of the columns returned in the 
query results is NOT the same as the column ordering in the parquet file.


{noformat}

0: jdbc:drill:schema=dfs.tmp> SELECT * FROM typeall_l ORDER BY col_int limit 1;
+--+--+-++--+-++---+++-+
| col_bln  | col_chr  |   col_dt|  col_flt   | col_int  | col_intrvl_day  | 
col_intrvl_yr  |  col_tim  |   col_tmstmp   |   col_vrchr1   | 
col_vrchr2  |
+--+--+-++--+-++---+++-+
| false| MI   | 1967-05-01  | 32.901897  | 0| P12DT20775S | 
P196M  | 19:50:17  | 2004-10-15 17:49:36.0  | Felecia Gourd  | NLBQMg9  
   |
+--+--+-++--+-++---+++-+
1 row selected (0.279 seconds)

{noformat}

Without the ORDER BY clause the columns are returned in correct order, same as 
the ordering in the parquet file.

{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM typeall_l limit 1;
+--+--++-+-+---++++-+--+
| col_int  | col_chr  |   col_vrchr1   |
 col_vrchr2  |  
 col_dt|  col_tim  |   col_tmstmp   |  col_flt   | col_intrvl_yr  | 
col_intrvl_day  | col_bln  |
+--+--++-+-+---++++-+--+
| 45436| WV   | John Mcginity  | 
Rhbf6VFLJguvH9ejrWNkY1CDO8QqumTZAGjwa9cHfjBnLmNIWvo9YfcGObxbeXwa1NkemW9ULxsq5293wEA2v5FFCduwt03D7ysI3RlH8b4B0XAPKY
  | 2011-11-04  | 18:02:26  | 1988-09-23 16:58:42.0  | 10.193293  | P314M   
   | P26DT27386S | false|
+--+--++-+-+---++++-+--+
1 row selected (0.22 seconds)


{noformat}


Thanks,

Khurram


[GitHub] drill issue #594: DRILL-4842: SELECT * on JSON data results in NumberFormatE...

2017-02-06 Thread chunhui-shi
Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/594
  
+1. LGTM. Need to address conflict before ready to commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...

2017-02-06 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/520#discussion_r99624485
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillConnectionImpl.java ---
@@ -760,6 +765,24 @@ public void abort(Executor executor) throws 
SQLException {
 }
   }
 
+  @Override
+  public boolean useAnsiQuotedIdentifiers() throws SQLException {
+boolean systemOption = false;
+Boolean sessionOption = null;
+String sql = String.format("select type, bool_val from sys.options 
where name = '%s'",
+PlannerSettings.ANSI_QUOTES_KEY);
+ResultSet rs = executeSql(sql);
--- End diff --

Done.
New RPC request is implemented. RPC response returns `ServerProperties` 
that represent the list of server session options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...

2017-02-06 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/520#discussion_r99623110
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/DrillConnection.java ---
@@ -213,4 +213,5 @@ void setNetworkTimeout( Executor executor, int 
milliseconds )
 
   DrillClient getClient();
 
+  boolean useAnsiQuotedIdentifiers() throws SQLException;
--- End diff --

Done.
This is the similar approach like in mySQL - setting quoting identifier 
character during the connection time.
Since a new RPC request is implemented, it is no longer necessary. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #743: DRILL-5243: Fix TestContextFunctions.sessionIdUDFWithinSam...

2017-02-06 Thread zfong
Github user zfong commented on the issue:

https://github.com/apache/drill/pull/743
  
Looks good to me.  +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #744: DRILL-5040: Parquet writer unable to delete table f...

2017-02-06 Thread arina-ielchiieva
GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/744

DRILL-5040: Parquet writer unable to delete table folder on abort

Folder directory clean up failed because couldn't delete the directory:
`java.io.IOException: Directory 
/tmp/446062ea-46ae-4785-98e3-0ee23df9ead5/3c6d40ff-31f2-419e-a178-d9c5fd731e11 
is not empty`. Replaced in `fs.delete(location, true)` recursive flag to `true` 
 to allow directory deletion.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-5040

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/744.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #744


commit cd9779e8b2e3efe995010e2e91620429ff019210
Author: Arina Ielchiieva 
Date:   2017-02-06T13:11:02Z

DRILL-5040: Parquet writer unable to delete table folder on abort




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #743: DRILL-5243: Fix TestContextFunctions.sessionIdUDFWi...

2017-02-06 Thread arina-ielchiieva
GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/743

DRILL-5243: Fix TestContextFunctions.sessionIdUDFWithinSameSession un…

…it test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-5243

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/743.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #743


commit 41a729ceb2a9f6aacb4295e4b06e842d268e1855
Author: Arina Ielchiieva 
Date:   2017-02-06T13:51:49Z

DRILL-5243: Fix TestContextFunctions.sessionIdUDFWithinSameSession unit test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5243) Fix TestContextFunctions.sessionIdUDFWithinSameSession unit test

2017-02-06 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-5243:
---

 Summary: Fix TestContextFunctions.sessionIdUDFWithinSameSession 
unit test
 Key: DRILL-5243
 URL: https://issues.apache.org/jira/browse/DRILL-5243
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.10.0


After DRILL-5043 was merged into master, it introduced unit test 
TestContextFunctions.sessionIdUDFWithinSameSession which is currently failing 
with the following error:
{noformat}
java.lang.Exception: org.apache.drill.common.exceptions.UserRemoteException: 
PARSE ERROR: Encountered ";" at line 1, column 48.
{noformat}

Fix:
remove semicolon in the end of the query
{noformat}
final String sessionIdQuery = "select session_id as sessionId from (values(1));"
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] drill issue #656: DRILL-5034: Select timestamp from hive generated parquet a...

2017-02-06 Thread vdiravka
Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/656
  
@bitblender 
The branch is rebased to the master version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #685: Drill 5043: Function that returns a unique id per session/...

2017-02-06 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/685
  
Changes have been merged into master. @nagarajanchinnasamy, please close 
pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---