Re: [DISCUSS] Regarding mutator interface

2018-04-13 Thread Aman Sinha
Hi Paul,
yes, the any_value function will need to have separate generated code for
different data types and mode combinations.  I believe Gautam has
implemented all the scalar types (through the standard template mechanism
that Drill follows).  The complex types are the ones that are harder.

> Moreover, the code generator must understand that code generated for a
Map UDAF must be different than that for a scalar UDAF. Presumably we must
have that code, since the UDF mechanism supports maps.

Yes, I assume you are referring to the decision point here [1].

There is some overlap with what we do with MAPPIFY/KVGEN function which
occurs as part of the Project operator.  ProjectRecordBatch generates code
for the functions that require ComplexWriter.The MAPPIFY function reads
data using a FieldReader [2]  and outputs data using a ComplexWriter.
 However, there are differences with how ANY_VALUE operates particularly
because we want to treat it as an Aggregate function.  For instance, a
ValueReference in a ComplexWriter is always marked as LATE binding type [3]
whereas for ANY_VALUE we want it to reflect the input type.  Code
generation for either StreamingAgg or HashAgg does not like LATE type.  So,
this is a new requirement which potentially needs some changes to
ValueReference.

Regarding repeated maps/arrays, let me discuss with Gautam about the
details and will provide an update.

For Hash Agg versus Streaming Agg, I have some thoughts that I will send
out in a follow-up email.

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java#L108
[2]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java#L55
[3]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/ValueReference.java#L76

-Aman

On Fri, Apr 13, 2018 at 10:31 AM, Paul Rogers 
wrote:

> Hi Aman & Gautam,
>
> FWIW, here is my understanding of UDAF functions based on a write-up I did
> a few months back. [1]
>
> All Drill functions are implemented using the UDF and UDAF protocol. (The
> "U" (User) is a bit of a misnomer, internal and user-defined functions
> follow the same protocol.) Every UDF (including UDAF) is strongly typed in
> its arguments and return value. To create the ANY_VALUE implementation, you
> must create a separate implementation for each and every combination of
> (type, mode). That is, we need a REQUIRED INT and OPTIONAL INT
> implementation for INT types.
>
> In this case, the incoming argument is passed using the argument
> annotation, and the return value via the return annotation. The generated
> code sets the incoming argument and copies the return value from the return
> variable. (There is an example of the generated code in the write-up.)
>
> For a Map, there is no annotation to say "set this value to a map" for
> either input or output. Instead, we pass in a complex reader for input (I
> believe) and a complex writer for output. (Here I am a bit hazy as I never
> had time to experiment with maps and arrays in UDFs.)
>
> So, you'll need a Map implementation. (Maps can only be REQUIRED, never
> OPTIONAL, unless they are in a UNION or LIST...)
>
> Moreover, the code generator must understand that code generated for a Map
> UDAF must be different than that for a scalar UDAF. Presumably we must have
> that code, since the UDF mechanism supports maps.
>
> Have you worked out how to handle arrays (REPEATED cardinality?) It was
> not clear from my exploration of UDFs how we handle REPEATED types. The
> UDAF must pass in one array, which the UDAF copies to its output, which is
> then written to the output repeated vector. Since values must arrive in
> Holders, it is not clear how this would be done for arrays. Perhaps there
> is an annotation that lets us use some form of complex writer for arrays as
> is done for MAPs? Again, sorry, I didn't have time to learn that bit. Would
> be great to understand that so we can add it to the write-up.
>
> This chain mentions a MAP type. Drill also includes other complex types:
> REPEATED MAP, (non-repeated) LIST, (repeated) LIST, and UNION. It is not at
> all clear how UDAFs work for these types.
>
> One other thing to consider: ANY_VALUE can never work for the hash agg
> because output values are updated in random order. It can only ever work
> for streaming agg because the streaming agg only appends output values.
> Fortunately, this chain is about the streaming agg. For Hash Agg,
> intermediate variable-width values are stored in an Object Vector, but
> those values won't survive serialization. As a result, only fixed-width
> types can be updated in random order. DRILL-6087 describes this issue.
>
> Thanks,
> - Paul
>
> [1] https://github.com/paul-rogers/drill/wiki/UDFs-Background-Information
>
>
>
>
>
>
> On Wednesday, April 11, 2018, 4:09:47 PM PDT, Aman Sinha <
> 

[GitHub] drill pull request #1202: DRILL-6311: No logging information in drillbit.log...

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1202


---


[GitHub] drill pull request #1195: DRILL-6273: Removed dependency licensed under Cate...

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1195


---


[GitHub] drill pull request #1198: DRILL-6294: Changes to support Calcite 1.16.0

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1198


---


[GitHub] drill pull request #1200: DRILL-143: Support CGROUPs resource management

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1200


---


[GitHub] drill issue #1207: DRILL-6320: Fixed License Headers

2018-04-13 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1207
  
@vrozov followed your instructions. Building the source distribution with 
license checks enabled works.


---


Drill JMH Framework

2018-04-13 Thread salim achouche
I have created a new infrastructure to run Drill JMH benchmarks. My goal is
to encourage sharing of JMH logic when performance testing the Drill logic.
At this time, I have uploaded this framework to a new GIT repository
drill-jmh .

Regards,
Salim


[jira] [Created] (DRILL-6329) TPC-DS Query 66 failed due to OOM

2018-04-13 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-6329:
-

 Summary: TPC-DS Query 66 failed due to OOM
 Key: DRILL-6329
 URL: https://issues.apache.org/jira/browse/DRILL-6329
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.14.0
Reporter: Khurram Faraaz
 Attachments: 252f0f20-2774-43d7-ec31-911ee0f5f330.sys.drill, 
TPCDS_Query_66.sql, TPCDS_Query_66_PLAN.txt

TPC-DS Query 66 failed after 27 minutes on Drill 1.14.0 on a 4 node cluster 
against SF1 parquet data (dfs.tpcds_sf1_parquet_views). Query 66 and the query 
profile and the query plan are attached here.

This seems to be a regression, the same query worked fine on 1.10.0

On Drill 1.10.0 ( git.commit id : bbcf4b76) => 9.026 seconds (completed 
successfully).
On Drill 1.14.0 ( git.commit.id.abbrev=da24113 ) query 66 failed after running 
for 27 minutes, due to OutOfMemoryException

Stack trace from sqlline console, no stack trace was written to drillbit.log
{noformat}
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the 
query.

Too little memory available
Fragment 2:0

[Error Id: 5636a939-a318-4b59-b3e8-9eb93f6b82f3 on qa102-45.qa.lab:31010]

(org.apache.drill.exec.exception.OutOfMemoryException) Too little memory 
available
 org.apache.drill.exec.test.generated.HashAggregatorGen7120.delayedSetup():409
 org.apache.drill.exec.test.generated.HashAggregatorGen7120.doWork():579
 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
 org.apache.drill.exec.record.AbstractRecordBatch.next():164
 org.apache.drill.exec.record.AbstractRecordBatch.next():119
 org.apache.drill.exec.record.AbstractRecordBatch.next():109
 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134
 org.apache.drill.exec.record.AbstractRecordBatch.next():164
 org.apache.drill.exec.physical.impl.BaseRootExec.next():105
 
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
 org.apache.drill.exec.physical.impl.BaseRootExec.next():95
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
 java.security.AccessController.doPrivileged():-2
 javax.security.auth.Subject.doAs():422
 org.apache.hadoop.security.UserGroupInformation.doAs():1595
 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
 org.apache.drill.common.SelfCleaningRunnable.run():38
 java.util.concurrent.ThreadPoolExecutor.runWorker():1149
 java.util.concurrent.ThreadPoolExecutor$Worker.run():624
 java.lang.Thread.run():748 (state=,code=0)
java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
while executing the query.

Too little memory available
Fragment 2:0

[Error Id: 5636a939-a318-4b59-b3e8-9eb93f6b82f3 on qa102-45.qa.lab:31010]

(org.apache.drill.exec.exception.OutOfMemoryException) Too little memory 
available
 org.apache.drill.exec.test.generated.HashAggregatorGen7120.delayedSetup():409
 org.apache.drill.exec.test.generated.HashAggregatorGen7120.doWork():579
 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
 org.apache.drill.exec.record.AbstractRecordBatch.next():164
 org.apache.drill.exec.record.AbstractRecordBatch.next():119
 org.apache.drill.exec.record.AbstractRecordBatch.next():109
 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134
 org.apache.drill.exec.record.AbstractRecordBatch.next():164
 org.apache.drill.exec.physical.impl.BaseRootExec.next():105
 
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
 org.apache.drill.exec.physical.impl.BaseRootExec.next():95
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
 java.security.AccessController.doPrivileged():-2
 javax.security.auth.Subject.doAs():422
 org.apache.hadoop.security.UserGroupInformation.doAs():1595
 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
 org.apache.drill.common.SelfCleaningRunnable.run():38
 java.util.concurrent.ThreadPoolExecutor.runWorker():1149
 java.util.concurrent.ThreadPoolExecutor$Worker.run():624
 java.lang.Thread.run():748
 
...
Caused by: org.apache.drill.common.exceptions.UserRemoteException: RESOURCE 
ERROR: One or more nodes ran out of memory while executing the query.

Too little memory available
Fragment 2:0

[Error Id: 5636a939-a318-4b59-b3e8-9eb93f6b82f3 on qa102-45.qa.lab:31010]

(org.apache.drill.exec.exception.OutOfMemoryException) Too little memory 
available
 org.apache.drill.exec.test.generated.HashAggregatorGen7120.delayedSetup():409
 org.apache.drill.exec.test.generated.HashAggregatorGen7120.doWork():579
 

[GitHub] drill issue #1207: DRILL-6320: Fixed License Headers

2018-04-13 Thread vrozov
Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1207
  
LGTM, @ilooner please double check that Apache source distribution can be 
built with '-Drat.skip=false -Dlicense.skip=false': 
- build with `-P apache.release -Dgpg.skip=true`
- extract created source .tar.gz or .zip to a temp directory
- build with apache-rat and license check enabled (you may need to add more 
files to the license check exclusion).


---


[jira] [Created] (DRILL-6328) Consolidate developer docs in docs/ folder of drill repo.

2018-04-13 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6328:
-

 Summary: Consolidate developer docs in docs/ folder of drill repo.
 Key: DRILL-6328
 URL: https://issues.apache.org/jira/browse/DRILL-6328
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

2018-04-13 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1184#discussion_r181513279
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
 public ${friendlyType} getObject(int index) {
   org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), 
org.joda.time.DateTimeZone.UTC);
   date = 
date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
-  return date;
+  return new java.sql.Date(date.getMillis());
--- End diff --

Sounds good.


---


[GitHub] drill pull request #1203: DRILL-6289: Cluster view should show more relevant...

2018-04-13 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1203#discussion_r181510944
  
--- Diff: 
common/src/main/java/org/apache/drill/exec/metrics/CpuGaugeSet.java ---
@@ -0,0 +1,62 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.metrics;
+
+import java.lang.management.OperatingSystemMXBean;
+import java.util.HashMap;
+import java.util.Map;
+
+import com.codahale.metrics.Gauge;
+import com.codahale.metrics.Metric;
+import com.codahale.metrics.MetricSet;
+
+/**
+ * Creates a Cpu GaugeSet
+ */
+class CpuGaugeSet implements MetricSet {
--- End diff --

I thought we might want to expand the GaugeSet to carry additional metrics 
like `ProcessCpuLoad` and `ProcessCpuTime` . 
Since we can shrink the {{SHUTDOWN}} button to a symbol, we do have some 
real-estate to provide the ProcessCPULoad information as well. Would it help to 
have that? 

https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad()



---


[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

2018-04-13 Thread jiang-wu
Github user jiang-wu commented on a diff in the pull request:

https://github.com/apache/drill/pull/1184#discussion_r181493581
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
 public ${friendlyType} getObject(int index) {
   org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), 
org.joda.time.DateTimeZone.UTC);
   date = 
date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
-  return date;
+  return new java.sql.Date(date.getMillis());
--- End diff --

How about we use Java 8 Local[Data|Time|Timestamp] for the public interface 
methods?  That sets things up for the future.

Internally, I won't change the logic that is using Joda DateTime, that is 
doing the various time zone stuff.  That behind the scene logic can be 
separately updated after determine what is the right behavior Drill wants to 
support.


---


[GitHub] drill pull request #1208: DRILL-6295: PartitionerDecorator may close partiti...

2018-04-13 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1208#discussion_r181488592
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/PartitionerDecorator.java
 ---
@@ -262,68 +280,122 @@ public FlushBatchesHandlingClass(boolean 
isLastBatch, boolean schemaChanged) {
 }
 
 @Override
-public void execute(Partitioner part) throws IOException {
+public void execute(Partitioner part) throws IOException, 
InterruptedException {
   part.flushOutgoingBatches(isLastBatch, schemaChanged);
 }
   }
 
   /**
-   * Helper class to wrap Runnable with customized naming
-   * Exception handling
+   * Helper class to wrap Runnable with cancellation and waiting for 
completion support
*
*/
-  private static class CustomRunnable implements Runnable {
+  private static class PartitionerTask implements Runnable {
+
+private enum STATE {
+  NEW,
+  COMPLETING,
+  NORMAL,
+  EXCEPTIONAL,
+  CANCELLED,
+  INTERRUPTING,
+  INTERRUPTED
+}
+
+private final AtomicReference state;
+private final AtomicReference runner;
+private final PartitionerDecorator partitionerDecorator;
+private final AtomicInteger count;
 
-private final String parentThreadName;
-private final CountDownLatch latch;
 private final GeneralExecuteIface iface;
-private final Partitioner part;
+private final Partitioner partitioner;
 private CountDownLatchInjection testCountDownLatch;
 
-private volatile IOException exp;
+private volatile ExecutionException exception;
 
-public CustomRunnable(final String parentThreadName, final 
CountDownLatch latch, final GeneralExecuteIface iface,
-final Partitioner part, CountDownLatchInjection 
testCountDownLatch) {
-  this.parentThreadName = parentThreadName;
-  this.latch = latch;
+public PartitionerTask(PartitionerDecorator partitionerDecorator, 
GeneralExecuteIface iface, Partitioner partitioner, AtomicInteger count, 
CountDownLatchInjection testCountDownLatch) {
+  state = new AtomicReference<>(STATE.NEW);
+  runner = new AtomicReference<>();
+  this.partitionerDecorator = partitionerDecorator;
   this.iface = iface;
-  this.part = part;
+  this.partitioner = partitioner;
+  this.count = count;
   this.testCountDownLatch = testCountDownLatch;
 }
 
 @Override
 public void run() {
-  // Test only - Pause until interrupted by fragment thread
-  try {
-testCountDownLatch.await();
-  } catch (final InterruptedException e) {
-logger.debug("Test only: partitioner thread is interrupted in test 
countdown latch await()", e);
-  }
-
-  final Thread currThread = Thread.currentThread();
-  final String currThreadName = currThread.getName();
-  final OperatorStats localStats = part.getStats();
-  try {
-final String newThreadName = parentThreadName + currThread.getId();
-currThread.setName(newThreadName);
+  final Thread thread = Thread.currentThread();
+  Preconditions.checkState(runner.compareAndSet(null, thread),
+  "PartitionerTask can be executed only once.");
+  if (state.get() == STATE.NEW) {
+final String name = thread.getName();
+thread.setName(String.format("Partitioner-%s-%d", 
partitionerDecorator.thread.getName(), thread.getId()));
+final OperatorStats localStats = partitioner.getStats();
 localStats.clear();
 localStats.startProcessing();
-iface.execute(part);
-  } catch (IOException e) {
-exp = e;
-  } finally {
-localStats.stopProcessing();
-currThread.setName(currThreadName);
-latch.countDown();
+ExecutionException executionException = null;
+try {
+  // Test only - Pause until interrupted by fragment thread
+  testCountDownLatch.await();
+  iface.execute(partitioner);
+} catch (InterruptedException e) {
+  if (state.compareAndSet(STATE.NEW, STATE.INTERRUPTED)) {
+logger.warn("Partitioner Task interrupted during the run", e);
+  }
+} catch (Throwable t) {
+  executionException = new ExecutionException(t);
+}
+if (state.compareAndSet(STATE.NEW, STATE.COMPLETING)) {
+  if (executionException == null) {
+localStats.stopProcessing();
+state.lazySet(STATE.NORMAL);
+  } else {
+exception = 

[GitHub] drill issue #1207: DRILL-6320: Fixed License Headers

2018-04-13 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1207
  
@vrozov @arina-ielchiieva Addressed comments.


---


[GitHub] drill pull request #1207: DRILL-6320: Fixed License Headers

2018-04-13 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1207#discussion_r181475030
  
--- Diff: pom.xml ---
@@ -375,12 +449,12 @@
   
 
   
-
   
 com.mycila
 license-maven-plugin
 3.0
 
+  ${license.skip} 
--- End diff --

updated with suggestion above


---


[GitHub] drill pull request #1207: DRILL-6320: Fixed License Headers

2018-04-13 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1207#discussion_r181474959
  
--- Diff: pom.xml ---
@@ -198,6 +200,78 @@
 
   
 
+  
+org.apache.rat
+apache-rat-plugin
+0.12
+
+  
+rat-checks
+validate
+
+  check
+
+  
+
+
+  ${rat.skip}  
--- End diff --

Your suggestion works. Made the change.


---


[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

2018-04-13 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1184#discussion_r181472373
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
 public ${friendlyType} getObject(int index) {
   org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), 
org.joda.time.DateTimeZone.UTC);
   date = 
date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
-  return date;
+  return new java.sql.Date(date.getMillis());
--- End diff --

I think updating to Java8 LocalDate/Time classes would be good choice. And 
it will be step forward in the resolving of the Drill's Date/Time issues 
mentioned in different Jiras: 
[DRILL-5334](https://issues.apache.org/jira/browse/DRILL-5334), 
[DRILL-5332](https://issues.apache.org/jira/browse/DRILL-5332) etc.


---


[GitHub] drill pull request #1207: DRILL-6320: Fixed License Headers

2018-04-13 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1207#discussion_r181468772
  
--- Diff: pom.xml ---
@@ -198,6 +200,78 @@
 
   
 
+  
+org.apache.rat
+apache-rat-plugin
+0.12
+
+  
--- End diff --

You are right, the docs say it is bound to the validate phase by default 
http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.html . I also 
removed `validate` and tested that it work correctly.


---


[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

2018-04-13 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1184#discussion_r181461481
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
 public ${friendlyType} getObject(int index) {
   org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), 
org.joda.time.DateTimeZone.UTC);
   date = 
date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
-  return date;
+  return new java.sql.Date(date.getMillis());
--- End diff --

Either one is fine (since java.time is based on Joda). We've switched to 
Java 8, but just for consistency with the rest of the code, we might as well 
use Joda.


---


[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

2018-04-13 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1184
  
I think that would be best. 


---


Re: [DISCUSS] Regarding mutator interface

2018-04-13 Thread Paul Rogers
Hi Aman & Gautam,

FWIW, here is my understanding of UDAF functions based on a write-up I did a 
few months back. [1]

All Drill functions are implemented using the UDF and UDAF protocol. (The "U" 
(User) is a bit of a misnomer, internal and user-defined functions follow the 
same protocol.) Every UDF (including UDAF) is strongly typed in its arguments 
and return value. To create the ANY_VALUE implementation, you must create a 
separate implementation for each and every combination of (type, mode). That 
is, we need a REQUIRED INT and OPTIONAL INT implementation for INT types.

In this case, the incoming argument is passed using the argument annotation, 
and the return value via the return annotation. The generated code sets the 
incoming argument and copies the return value from the return variable. (There 
is an example of the generated code in the write-up.)

For a Map, there is no annotation to say "set this value to a map" for either 
input or output. Instead, we pass in a complex reader for input (I believe) and 
a complex writer for output. (Here I am a bit hazy as I never had time to 
experiment with maps and arrays in UDFs.)

So, you'll need a Map implementation. (Maps can only be REQUIRED, never 
OPTIONAL, unless they are in a UNION or LIST...)

Moreover, the code generator must understand that code generated for a Map UDAF 
must be different than that for a scalar UDAF. Presumably we must have that 
code, since the UDF mechanism supports maps.

Have you worked out how to handle arrays (REPEATED cardinality?) It was not 
clear from my exploration of UDFs how we handle REPEATED types. The UDAF must 
pass in one array, which the UDAF copies to its output, which is then written 
to the output repeated vector. Since values must arrive in Holders, it is not 
clear how this would be done for arrays. Perhaps there is an annotation that 
lets us use some form of complex writer for arrays as is done for MAPs? Again, 
sorry, I didn't have time to learn that bit. Would be great to understand that 
so we can add it to the write-up.

This chain mentions a MAP type. Drill also includes other complex types: 
REPEATED MAP, (non-repeated) LIST, (repeated) LIST, and UNION. It is not at all 
clear how UDAFs work for these types.

One other thing to consider: ANY_VALUE can never work for the hash agg because 
output values are updated in random order. It can only ever work for streaming 
agg because the streaming agg only appends output values. Fortunately, this 
chain is about the streaming agg. For Hash Agg, intermediate variable-width 
values are stored in an Object Vector, but those values won't survive 
serialization. As a result, only fixed-width types can be updated in random 
order. DRILL-6087 describes this issue.

Thanks,
- Paul

[1] https://github.com/paul-rogers/drill/wiki/UDFs-Background-Information




 

On Wednesday, April 11, 2018, 4:09:47 PM PDT, Aman Sinha 
 wrote:  
 
 Here's some background on what Gautam is trying to do:  Currently, SQL does
not have a standard way to do a DISTINCT on a subset of the columns in the
SELECT list.  Suppose there are 2 columns:
  a:  INTEGER
  b:  MAP
Suppose I want to only do DISTINCT on 'a' and I don't really care about the
column 'b' .. I just want the first or any value of 'b' within a single
group of 'a'.    Postgres actually has a 'DISTINCT ON(a), b' syntax but
based on our discussion on the Calcite mailing list, we want to avoid that
syntax.  So, there's an alternative proposal to do the following:

    SELECT a, ANY_VALUE(b) FROM table GROUP BY a

This means, ANY_VALUE will essentially be treated as an Aggregate function
and from a code-gen perspective, we want to read 1 item (a MapHolder) from
the incoming MapVector and write it to a particular index in the output
MapVector.    This is where  it would be useful to  have
MapVector.setSafe()  since the StreamingAgg and HashAgg both generate
setSafe()  for normal aggregate functions.

However, it seems the better (or perhaps only) way to do this is through
the MapOrListWriter (ComplexWriter) as long as there's a way to instruct
the writer to write to a specific output index (the output index is needed
because there are several groups in the output container and we want to
write to a specific one).

-Aman


On Wed, Apr 11, 2018 at 2:13 PM, Paul Rogers 
wrote:

> What semantics are wanted? SetSafe sets a single value in a vector. What
> does it mean to set a single map or array value? What would we pass as an
> argument?
> For non-simple types, something needs to iterate over the values: be they
> elements of a map, elements in an array, elements of an array of maps, then
> over the map members, etc.
> I believe that you are hitting a fundamental difference between simple
> scale values and complex (composite) values.
> This is for an aggregate. There is no meaningful aggregate of a map or an
> array. Once could aggregate over a scalar that is a member of a 

[GitHub] drill issue #1207: DRILL-6320: Fixed License Headers

2018-04-13 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1207
  
@arina-ielchiieva fixed


---


[GitHub] drill pull request #1207: DRILL-6320: Fixed License Headers

2018-04-13 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1207#discussion_r181434241
  
--- Diff: pom.xml ---
@@ -198,6 +200,78 @@
 
   
 
+  
+org.apache.rat
+apache-rat-plugin
+0.12
+
+  
--- End diff --

I refer to `validate`. 


---


[GitHub] drill issue #1203: DRILL-6289: Cluster view should show more relevant inform...

2018-04-13 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1203
  
Regarding shutdown button (icon) placement, there are two more options:
1. have separate column for shutdown, when shutdown for the particular 
drillbits are unavailable, show disabled button / icon. Maybe better then just 
have empty rows as we do now.
2. remove shutdown column and add shutdown icon in address column near 
those drillbits which we can shutdown.


---


[GitHub] drill issue #1204: DRILL-6318

2018-04-13 Thread oleg-zinovev
Github user oleg-zinovev commented on the issue:

https://github.com/apache/drill/pull/1204
  
Reason for build error: "The job is the maximum time limit for jobs, and 
has been terminated."
What should I do next?


---


[GitHub] drill issue #1207: DRILL-6320: Fixed License Headers

2018-04-13 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1207
  
@ilooner please also update this part in install.md:
```
# mvn --version
Apache Maven 3.0.3 (r1075438; 2011-02-28 09:31:09-0800)
```


---