date:20160406


[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229729#comment-15229729
 ] 

ASF GitHub Bot commented on DRILL-3714:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/463#discussion_r58821834
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java ---
@@ -84,7 +115,7 @@ public void operationComplete(ChannelFuture future) 
throws Exception {
   if (!future.isSuccess()) {
 removeFromMap(coordinationId);
 if (future.channel().isActive()) {
-   throw new RpcException("Future failed") ;
+  throw new RpcException("Future failed");
--- End diff --

Since the future did not succeed, should this 
`setException(future.cause())`? There would be no outcome for the `handler` 
otherwise, right?


> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by:

[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted


[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229728#comment-15229728
 ] 

ASF GitHub Bot commented on DRILL-3714:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/463#discussion_r58821796
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java ---
@@ -20,51 +20,82 @@
 import io.netty.buffer.ByteBuf;
 import io.netty.channel.ChannelFuture;
 
-import java.util.Map;
-import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
 
 import org.apache.drill.common.exceptions.UserRemoteException;
 import org.apache.drill.exec.proto.UserBitShared.DrillPBError;
 
+import com.carrotsearch.hppc.IntObjectHashMap;
+import com.carrotsearch.hppc.procedures.IntObjectProcedure;
+import com.google.common.base.Preconditions;
+
 /**
- * Manages the creation of rpc futures for a particular socket.
+ * Manages the creation of rpc futures for a particular socket <--> socket
+ * connection. Generally speaking, there will be two threads working with 
this
+ * class (the socket thread and the Request generating thread). 
Synchronization
+ * is simple with the map being the only thing that is protected. 
Everything
+ * else works via Atomic variables.
  */
-public class CoordinationQueue {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class);
+class RequestIdMap {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RequestIdMap.class);
+
+  private final AtomicInteger value = new AtomicInteger();
+  private final AtomicBoolean acceptMessage = new AtomicBoolean(true);
 
-  private final PositiveAtomicInteger circularInt = new 
PositiveAtomicInteger();
-  private final Map map;
+  /** Access to map must be protected. **/
+  private final IntObjectHashMap map;
 
-  public CoordinationQueue(int segmentSize, int segmentCount) {
-map = new ConcurrentHashMap(segmentSize, 
0.75f, segmentCount);
+  public RequestIdMap() {
+map = new IntObjectHashMap();
   }
 
   void channelClosed(Throwable ex) {
+acceptMessage.set(false);
 if (ex != null) {
-  RpcException e;
-  if (ex instanceof RpcException) {
-e = (RpcException) ex;
-  } else {
-e = new RpcException(ex);
+  final RpcException e = RpcException.mapException(ex);
+  synchronized (map) {
+map.forEach(new Closer(e));
+map.clear();
   }
-  for (RpcOutcome f : map.values()) {
-f.setException(e);
+}
+  }
+
+  private class Closer implements IntObjectProcedure {
+final RpcException exception;
+
+public Closer(RpcException exception) {
+  this.exception = exception;
+}
+
+@Override
+public void apply(int key, RpcOutcome value) {
+  try{
+value.setException(exception);
+  }catch(Exception e){
+logger.warn("Failure while attempting to fail rpc response.", e);
   }
 }
+
   }
 
-  public  ChannelListenerWithCoordinationId get(RpcOutcomeListener 
handler, Class clazz, RemoteConnection connection) {
-int i = circularInt.getNext();
+  public  ChannelListenerWithCoordinationId 
createNewRpcListener(RpcOutcomeListener handler, Class clazz,
+  RemoteConnection connection) {
+int i = value.incrementAndGet();
 RpcListener future = new RpcListener(handler, clazz, i, 
connection);
-Object old = map.put(i, future);
-if (old != null) {
-  throw new IllegalStateException(
-  "You attempted to reuse a coordination id when the previous 
coordination id has not been removed.  This is likely rpc future callback 
memory leak.");
+final Object old;
+synchronized (map) {
+  Preconditions.checkArgument(acceptMessage.get(),
--- End diff --

Make this check first statement in the method?


> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
>

[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted


[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229727#comment-15229727
 ] 

ASF GitHub Bot commented on DRILL-3714:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/463#discussion_r58821789
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java ---
@@ -20,51 +20,82 @@
 import io.netty.buffer.ByteBuf;
 import io.netty.channel.ChannelFuture;
 
-import java.util.Map;
-import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
 
 import org.apache.drill.common.exceptions.UserRemoteException;
 import org.apache.drill.exec.proto.UserBitShared.DrillPBError;
 
+import com.carrotsearch.hppc.IntObjectHashMap;
+import com.carrotsearch.hppc.procedures.IntObjectProcedure;
+import com.google.common.base.Preconditions;
+
 /**
- * Manages the creation of rpc futures for a particular socket.
+ * Manages the creation of rpc futures for a particular socket <--> socket
+ * connection. Generally speaking, there will be two threads working with 
this
+ * class (the socket thread and the Request generating thread). 
Synchronization
+ * is simple with the map being the only thing that is protected. 
Everything
+ * else works via Atomic variables.
  */
-public class CoordinationQueue {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class);
+class RequestIdMap {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RequestIdMap.class);
+
+  private final AtomicInteger value = new AtomicInteger();
+  private final AtomicBoolean acceptMessage = new AtomicBoolean(true);
 
-  private final PositiveAtomicInteger circularInt = new 
PositiveAtomicInteger();
-  private final Map map;
+  /** Access to map must be protected. **/
+  private final IntObjectHashMap map;
 
-  public CoordinationQueue(int segmentSize, int segmentCount) {
-map = new ConcurrentHashMap(segmentSize, 
0.75f, segmentCount);
+  public RequestIdMap() {
+map = new IntObjectHashMap();
   }
 
   void channelClosed(Throwable ex) {
+acceptMessage.set(false);
 if (ex != null) {
-  RpcException e;
-  if (ex instanceof RpcException) {
-e = (RpcException) ex;
-  } else {
-e = new RpcException(ex);
+  final RpcException e = RpcException.mapException(ex);
+  synchronized (map) {
+map.forEach(new Closer(e));
+map.clear();
   }
-  for (RpcOutcome f : map.values()) {
-f.setException(e);
+}
+  }
+
+  private class Closer implements IntObjectProcedure {
--- End diff --

Better class name? `SetExceptionProcedure`?


> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086

[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted


[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229724#comment-15229724
 ] 

ASF GitHub Bot commented on DRILL-3714:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/463#discussion_r58821752
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java ---
@@ -20,51 +20,82 @@
 import io.netty.buffer.ByteBuf;
 import io.netty.channel.ChannelFuture;
 
-import java.util.Map;
-import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
 
 import org.apache.drill.common.exceptions.UserRemoteException;
 import org.apache.drill.exec.proto.UserBitShared.DrillPBError;
 
+import com.carrotsearch.hppc.IntObjectHashMap;
+import com.carrotsearch.hppc.procedures.IntObjectProcedure;
+import com.google.common.base.Preconditions;
+
 /**
- * Manages the creation of rpc futures for a particular socket.
+ * Manages the creation of rpc futures for a particular socket <--> socket
+ * connection. Generally speaking, there will be two threads working with 
this
+ * class (the socket thread and the Request generating thread). 
Synchronization
+ * is simple with the map being the only thing that is protected. 
Everything
+ * else works via Atomic variables.
  */
-public class CoordinationQueue {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class);
+class RequestIdMap {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RequestIdMap.class);
+
+  private final AtomicInteger value = new AtomicInteger();
--- End diff --

How about `coordinationIdCounter` and `isOpen`?


> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
>

[jira] [Commented] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR

2016-04-06 Thread jean-claude (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229500#comment-15229500
 ] 

jean-claude commented on DRILL-4573:


Please review

> Zero copy LIKE, REGEXP_MATCHES, SUBSTR
> --
>
> Key: DRILL-4573
> URL: https://issues.apache.org/jira/browse/DRILL-4573
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: jean-claude
>Priority: Minor
> Attachments: DRILL-4573.1.patch.txt
>
>
> All the functions using the java.util.regex.Matcher are currently creating 
> Java string objects to pass into the matcher.reset().
> However this creates unnecessary copy of the bytes and a Java string object.
> The matcher uses a CharSequence, so instead of making a copy we can create an 
> adapter from the DrillBuffer to the CharSequence interface.
> Gains of 25% in execution speed are possible when going over VARCHAR of 36 
> chars. The gain will be proportional to the size of the VARCHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR

2016-04-06 Thread jean-claude (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229499#comment-15229499
 ] 

jean-claude commented on DRILL-4573:


You can test the performance gain by creating a simple csv file with one column 
containing UUID like this
for i in {1..100}; do uuidgen; done > /Users/jccote/test.csv

then query using drill

select count(1) from dfs.`/Users/jccote/test.csv` where columns[0] like '0%';

run it multiple times to get a good estimate



> Zero copy LIKE, REGEXP_MATCHES, SUBSTR
> --
>
> Key: DRILL-4573
> URL: https://issues.apache.org/jira/browse/DRILL-4573
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: jean-claude
>Priority: Minor
> Attachments: DRILL-4573.1.patch.txt
>
>
> All the functions using the java.util.regex.Matcher are currently creating 
> Java string objects to pass into the matcher.reset().
> However this creates unnecessary copy of the bytes and a Java string object.
> The matcher uses a CharSequence, so instead of making a copy we can create an 
> adapter from the DrillBuffer to the CharSequence interface.
> Gains of 25% in execution speed are possible when going over VARCHAR of 36 
> chars. The gain will be proportional to the size of the VARCHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4523) Disallow using loopback address in distributed mode


[ 
https://issues.apache.org/jira/browse/DRILL-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229459#comment-15229459
 ] 

ASF GitHub Bot commented on DRILL-4523:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/445


> Disallow using loopback address in distributed mode
> ---
>
> Key: DRILL-4523
> URL: https://issues.apache.org/jira/browse/DRILL-4523
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
>
> If we enable debug for org.apache.drill.exec.coord.zk in logback.xml, we only 
> get the hostname and ports information. For example:
> {code}
> 2015-11-04 19:47:02,927 [ServiceCache-0] DEBUG 
> o.a.d.e.c.zk.ZKClusterCoordinator - Cache changed, updating.
> 2015-11-04 19:47:02,932 [ServiceCache-0] DEBUG 
> o.a.d.e.c.zk.ZKClusterCoordinator - Active drillbit set changed.  Now 
> includes 2 total bits.  New active drillbits:
>  h3.poc.com:31010:31011:31012
>  h2.poc.com:31010:31011:31012
> {code}
> We need to know the IP address of each hostname to do further troubleshooting.
> Imagine if any drillbit registers itself as "localhost.localdomain" in 
> zookeeper, we will never know where it comes from. Enabling IP address 
> tracking can help this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4588) Enable JMXReporter to Expose Metrics


[ 
https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229458#comment-15229458
 ] 

ASF GitHub Bot commented on DRILL-4588:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/469


> Enable JMXReporter to Expose Metrics
> 
>
> Key: DRILL-4588
> URL: https://issues.apache.org/jira/browse/DRILL-4588
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> -There is a static initialization order issue that needs to be fixed.-
> The code is commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4544) Improve error messages for REFRESH TABLE METADATA command


[ 
https://issues.apache.org/jira/browse/DRILL-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229461#comment-15229461
 ] 

ASF GitHub Bot commented on DRILL-4544:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/448


> Improve error messages for REFRESH TABLE METADATA command
> -
>
> Key: DRILL-4544
> URL: https://issues.apache.org/jira/browse/DRILL-4544
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.7.0
>
>
> Improve the error messages thrown by REFRESH TABLE METADATA command:
> In the first case below, the error is maprfs.abc doesn't exist. It should 
> throw a Object not found or workspace not found. It is currently throwing a 
> non helpful message;
> 0: jdbc:drill:> refresh table metadata maprfs.abc.`my_table`;
> +
> oksummary
> +
> false Error: null
> +
> 1 row selected (0.355 seconds)
> In the second case below, it says refresh table metadata is supported only 
> for single-directory based Parquet tables. But the command works for nested 
> multi-directory Parquet files.
> 0: jdbc:drill:> refresh table metadata maprfs.vnaranammalpuram.`rfm_sales_vw`;
> ---+
> oksummary
> ---+
> false Table rfm_sales_vw does not support metadata refresh. Support is 
> currently limited to single-directory-based Parquet tables.
> ---+
> 1 row selected (0.418 seconds)
> 0: jdbc:drill:>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3743) query hangs on sqlline once Drillbit on foreman node is killed


[ 
https://issues.apache.org/jira/browse/DRILL-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229460#comment-15229460
 ] 

ASF GitHub Bot commented on DRILL-3743:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/460


> query hangs on sqlline once Drillbit on foreman node is killed
> --
>
> Key: DRILL-3743
> URL: https://issues.apache.org/jira/browse/DRILL-3743
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sudheesh Katkam
>Priority: Critical
> Fix For: Future
>
>
> sqlline/query hangs once Drillbit (on Foreman node) is killed. (kill -9 )
> query was issued from the Foreman node. The query returns many records, and 
> it is a long running query.
> Steps to reproduce the problem.
> set planner.slice_target=1
> 1.  clush -g khurram service mapr-warden stop
> 2.  clush -g khurram service mapr-warden start
> 3.  ./sqlline -u "jdbc:drill:schema=dfs.tmp"
> 0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 200;
> 4.  Immediately from another console do a jps and kill the Drillbit process 
> (in this case foreman) while the query is being run on sqlline. You will 
> notice that sqlline just hangs, we do not see any exceptions or errors being 
> reported on sqlline prompt or in drillbit.log or drillbit.out
> I do see this Exception in sqlline.log on the node from where sqlline was 
> started
> {code}
> 2015-09-04 18:45:12,069 [Client-1] INFO  o.a.d.e.rpc.user.QueryResultHandler 
> - User Error Occurred
> org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: 
> Connection /10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) 
> closed unexpectedly.
> [Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-09-04 18:45:12,069 [Client-1] INFO  
> o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#7] Query failed:
> org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: 
> Connection /10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) 
> closed unexpectedly.
> [Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>

[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted


[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229455#comment-15229455
 ] 

ASF GitHub Bot commented on DRILL-3714:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/463#discussion_r58806647
  
--- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RpcBus.java ---
@@ -158,22 +157,16 @@ public ChannelClosedHandler(C clientConnection, 
Channel channel) {
 
 @Override
 public void operationComplete(ChannelFuture future) throws Exception {
-  String msg;
+  final String msg;
+
   if(local!=null) {
 msg = String.format("Channel closed %s <--> %s.", local, remote);
   }else{
 msg = String.format("Channel closed %s <--> %s.", 
future.channel().localAddress(), future.channel().remoteAddress());
   }
 
-  if (RpcBus.this.isClient()) {
--- End diff --

`isClient` method is no longer used. Remove the method.


> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>

[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted


[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229451#comment-15229451
 ] 

ASF GitHub Bot commented on DRILL-3714:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/463#discussion_r58806614
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java ---
@@ -20,51 +20,82 @@
 import io.netty.buffer.ByteBuf;
 import io.netty.channel.ChannelFuture;
 
-import java.util.Map;
-import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
 
 import org.apache.drill.common.exceptions.UserRemoteException;
 import org.apache.drill.exec.proto.UserBitShared.DrillPBError;
 
+import com.carrotsearch.hppc.IntObjectHashMap;
+import com.carrotsearch.hppc.procedures.IntObjectProcedure;
+import com.google.common.base.Preconditions;
+
 /**
- * Manages the creation of rpc futures for a particular socket.
+ * Manages the creation of rpc futures for a particular socket <--> socket
+ * connection. Generally speaking, there will be two threads working with 
this
+ * class (the socket thread and the Request generating thread). 
Synchronization
+ * is simple with the map being the only thing that is protected. 
Everything
+ * else works via Atomic variables.
  */
-public class CoordinationQueue {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class);
+class RequestIdMap {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RequestIdMap.class);
+
+  private final AtomicInteger value = new AtomicInteger();
+  private final AtomicBoolean acceptMessage = new AtomicBoolean(true);
 
-  private final PositiveAtomicInteger circularInt = new 
PositiveAtomicInteger();
--- End diff --

Remove PositiveAtomicIneteger class.


> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
>

[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted


[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229456#comment-15229456
 ] 

ASF GitHub Bot commented on DRILL-3714:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/463#discussion_r58806651
  
--- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RpcBus.java ---
@@ -261,6 +251,7 @@ public void execute(Runnable command) {
 
 public InboundHandler(C connection) {
   super();
+  Preconditions.checkNotNull(connection);
--- End diff --

`this.connection = Preconditions.checkNotNull(connection);`


> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
>  [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) 
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71]
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.7.0_71]
> at

[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted


[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229454#comment-15229454
 ] 

ASF GitHub Bot commented on DRILL-3714:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/463#discussion_r58806644
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java ---
@@ -20,51 +20,82 @@
 import io.netty.buffer.ByteBuf;
 import io.netty.channel.ChannelFuture;
 
-import java.util.Map;
-import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
 
 import org.apache.drill.common.exceptions.UserRemoteException;
 import org.apache.drill.exec.proto.UserBitShared.DrillPBError;
 
+import com.carrotsearch.hppc.IntObjectHashMap;
+import com.carrotsearch.hppc.procedures.IntObjectProcedure;
+import com.google.common.base.Preconditions;
+
 /**
- * Manages the creation of rpc futures for a particular socket.
+ * Manages the creation of rpc futures for a particular socket <--> socket
+ * connection. Generally speaking, there will be two threads working with 
this
+ * class (the socket thread and the Request generating thread). 
Synchronization
+ * is simple with the map being the only thing that is protected. 
Everything
+ * else works via Atomic variables.
  */
-public class CoordinationQueue {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class);
+class RequestIdMap {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RequestIdMap.class);
+
+  private final AtomicInteger value = new AtomicInteger();
+  private final AtomicBoolean acceptMessage = new AtomicBoolean(true);
 
-  private final PositiveAtomicInteger circularInt = new 
PositiveAtomicInteger();
-  private final Map map;
+  /** Access to map must be protected. **/
+  private final IntObjectHashMap map;
 
-  public CoordinationQueue(int segmentSize, int segmentCount) {
-map = new ConcurrentHashMap(segmentSize, 
0.75f, segmentCount);
+  public RequestIdMap() {
+map = new IntObjectHashMap();
   }
 
   void channelClosed(Throwable ex) {
+acceptMessage.set(false);
 if (ex != null) {
-  RpcException e;
-  if (ex instanceof RpcException) {
-e = (RpcException) ex;
-  } else {
-e = new RpcException(ex);
+  final RpcException e = RpcException.mapException(ex);
+  synchronized (map) {
+map.forEach(new Closer(e));
+map.clear();
   }
-  for (RpcOutcome f : map.values()) {
-f.setException(e);
+}
+  }
+
+  private class Closer implements IntObjectProcedure {
+final RpcException exception;
+
+public Closer(RpcException exception) {
+  this.exception = exception;
+}
+
+@Override
+public void apply(int key, RpcOutcome value) {
+  try{
+value.setException(exception);
+  }catch(Exception e){
+logger.warn("Failure while attempting to fail rpc response.", e);
   }
 }
+
   }
 
-  public  ChannelListenerWithCoordinationId get(RpcOutcomeListener 
handler, Class clazz, RemoteConnection connection) {
-int i = circularInt.getNext();
+  public  ChannelListenerWithCoordinationId 
createNewRpcListener(RpcOutcomeListener handler, Class clazz,
+  RemoteConnection connection) {
+int i = value.incrementAndGet();
 RpcListener future = new RpcListener(handler, clazz, i, 
connection);
-Object old = map.put(i, future);
-if (old != null) {
-  throw new IllegalStateException(
-  "You attempted to reuse a coordination id when the previous 
coordination id has not been removed.  This is likely rpc future callback 
memory leak.");
+final Object old;
+synchronized (map) {
+  Preconditions.checkArgument(acceptMessage.get(),
+  "Attempted to send a message when connection is no longer 
valid.");
+  old = map.put(i, future);
 }
+Preconditions.checkArgument(old == null,
--- End diff --

Not required, since numbers are no longer reused?


> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>

[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted


[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229450#comment-15229450
 ] 

ASF GitHub Bot commented on DRILL-3714:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/463#discussion_r58806611
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java ---
@@ -20,51 +20,82 @@
 import io.netty.buffer.ByteBuf;
 import io.netty.channel.ChannelFuture;
 
-import java.util.Map;
-import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
 
 import org.apache.drill.common.exceptions.UserRemoteException;
 import org.apache.drill.exec.proto.UserBitShared.DrillPBError;
 
+import com.carrotsearch.hppc.IntObjectHashMap;
+import com.carrotsearch.hppc.procedures.IntObjectProcedure;
+import com.google.common.base.Preconditions;
+
 /**
- * Manages the creation of rpc futures for a particular socket.
+ * Manages the creation of rpc futures for a particular socket <--> socket
+ * connection. Generally speaking, there will be two threads working with 
this
+ * class (the socket thread and the Request generating thread). 
Synchronization
+ * is simple with the map being the only thing that is protected. 
Everything
+ * else works via Atomic variables.
  */
-public class CoordinationQueue {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class);
+class RequestIdMap {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RequestIdMap.class);
--- End diff --

private


> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN  
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms.  Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).  
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct 
> buffer memory
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
>  ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at 
>

[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted


[ 
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229453#comment-15229453
 ] 

ASF GitHub Bot commented on DRILL-3714:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/463#discussion_r58806623
  
--- Diff: 
exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java ---
@@ -20,51 +20,82 @@
 import io.netty.buffer.ByteBuf;
 import io.netty.channel.ChannelFuture;
 
-import java.util.Map;
-import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
 
 import org.apache.drill.common.exceptions.UserRemoteException;
 import org.apache.drill.exec.proto.UserBitShared.DrillPBError;
 
+import com.carrotsearch.hppc.IntObjectHashMap;
+import com.carrotsearch.hppc.procedures.IntObjectProcedure;
+import com.google.common.base.Preconditions;
+
 /**
- * Manages the creation of rpc futures for a particular socket.
+ * Manages the creation of rpc futures for a particular socket <--> socket
+ * connection. Generally speaking, there will be two threads working with 
this
+ * class (the socket thread and the Request generating thread). 
Synchronization
+ * is simple with the map being the only thing that is protected. 
Everything
+ * else works via Atomic variables.
  */
-public class CoordinationQueue {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class);
+class RequestIdMap {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RequestIdMap.class);
+
+  private final AtomicInteger value = new AtomicInteger();
+  private final AtomicBoolean acceptMessage = new AtomicBoolean(true);
 
-  private final PositiveAtomicInteger circularInt = new 
PositiveAtomicInteger();
-  private final Map map;
+  /** Access to map must be protected. **/
+  private final IntObjectHashMap map;
 
-  public CoordinationQueue(int segmentSize, int segmentCount) {
-map = new ConcurrentHashMap(segmentSize, 
0.75f, segmentCount);
+  public RequestIdMap() {
+map = new IntObjectHashMap();
   }
 
   void channelClosed(Throwable ex) {
+acceptMessage.set(false);
 if (ex != null) {
-  RpcException e;
-  if (ex instanceof RpcException) {
-e = (RpcException) ex;
-  } else {
-e = new RpcException(ex);
+  final RpcException e = RpcException.mapException(ex);
+  synchronized (map) {
+map.forEach(new Closer(e));
+map.clear();
   }
-  for (RpcOutcome f : map.values()) {
-f.setException(e);
+}
+  }
+
+  private class Closer implements IntObjectProcedure {
+final RpcException exception;
+
+public Closer(RpcException exception) {
+  this.exception = exception;
+}
+
+@Override
+public void apply(int key, RpcOutcome value) {
+  try{
--- End diff --

Inconsistent spacing here and below.


> Query runs out of memory and remains in CANCELLATION_REQUESTED state until 
> drillbit is restarted
> 
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, 
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when 
> hitting OOM condition.
> Query runs out of memory during execution and remains in 
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO 
>  o.a.d.e.w.f.FragmentStatusReporter - 
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22:

[jira] [Updated] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR

2016-04-06 Thread jean-claude (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jean-claude updated DRILL-4573:
---
Attachment: DRILL-4573.1.patch.txt

> Zero copy LIKE, REGEXP_MATCHES, SUBSTR
> --
>
> Key: DRILL-4573
> URL: https://issues.apache.org/jira/browse/DRILL-4573
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: jean-claude
>Priority: Minor
> Attachments: DRILL-4573.1.patch.txt
>
>
> All the functions using the java.util.regex.Matcher are currently creating 
> Java string objects to pass into the matcher.reset().
> However this creates unnecessary copy of the bytes and a Java string object.
> The matcher uses a CharSequence, so instead of making a copy we can create an 
> adapter from the DrillBuffer to the CharSequence interface.
> Gains of 25% in execution speed are possible when going over VARCHAR of 36 
> chars. The gain will be proportional to the size of the VARCHAR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in


[ 
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229433#comment-15229433
 ] 

ASF GitHub Bot commented on DRILL-4577:
---

Github user hsuanyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/461#discussion_r58805242
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -72,4 +80,76 @@ public String getTypeName() {
 return HiveStoragePluginConfig.NAME;
   }
 
+  @Override
+  public List> getTablesByNames(final 
List tableNames) {
+final String schemaName = getName();
+final List> tableNameToTable = 
Lists.newArrayList();
+List tables;
+// Retries once if the first call to fetch the metadata fails
+synchronized(mClient) {
+  final List tableNamesWithAuth = Lists.newArrayList();
+  for(String tableName : tableNames) {
+try {
+  if(mClient.tableExists(schemaName, tableName)) {
--- End diff --

I did some tests here. When there are many tables, the improvement by 
optimizing for the second objective is not significant enough. However, the 
objective of this issue would make sense only when there are many tables. I 
think I still need to figure out a solution.


> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of 
> queries. Beside, a more efficient way is to fetch tables is to use 
> get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4588) Enable JMXReporter to Expose Metrics


[ 
https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229372#comment-15229372
 ] 

ASF GitHub Bot commented on DRILL-4588:
---

Github user parthchandra commented on the pull request:

https://github.com/apache/drill/pull/469#issuecomment-206624211
  
+1.


> Enable JMXReporter to Expose Metrics
> 
>
> Key: DRILL-4588
> URL: https://issues.apache.org/jira/browse/DRILL-4588
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> -There is a static initialization order issue that needs to be fixed.-
> The code is commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4588) Enable JMXReporter to Expose Metrics


 [ 
https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-4588:
---
Description: 
-There is a static initialization order issue that needs to be fixed.-
The code is commented out.

  was:There is a static initialization order issue that needs to be fixed.


> Enable JMXReporter to Expose Metrics
> 
>
> Key: DRILL-4588
> URL: https://issues.apache.org/jira/browse/DRILL-4588
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> -There is a static initialization order issue that needs to be fixed.-
> The code is commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4588) Enable JMXReporter to Expose Metrics


[ 
https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229364#comment-15229364
 ] 

ASF GitHub Bot commented on DRILL-4588:
---

GitHub user sudheeshkatkam opened a pull request:

https://github.com/apache/drill/pull/469

DRILL-4588: Enable JMX reporting

@parthchandra please review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sudheeshkatkam/drill DRILL-4588

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/469.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #469


commit 4500cc9075c72622972e81939551ada2dfdca0a5
Author: Sudheesh Katkam 
Date:   2016-04-06T23:41:52Z

DRILL-4588: Enable JMX reporting




> Enable JMXReporter to Expose Metrics
> 
>
> Key: DRILL-4588
> URL: https://issues.apache.org/jira/browse/DRILL-4588
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> There is a static initialization order issue that needs to be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead


[ 
https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229300#comment-15229300
 ] 

ASF GitHub Bot commented on DRILL-4589:
---

GitHub user jinfengni opened a pull request:

https://github.com/apache/drill/pull/468

DRILL-4589: Reduce planning time for file system partition pruning by…

… reducing filter evaluation overhead

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinfengni/incubator-drill DRILL-4589

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/468.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #468


commit e207a926e65cd788700229de3ae47cf4e876
Author: Jinfeng Ni 
Date:   2016-02-25T18:13:43Z

DRILL-4589: Reduce planning time for file system partition pruning by 
reducing filter evaluation overhead




> Reduce planning time for file system partition pruning by reducing filter 
> evaluation overhead
> -
>
> Key: DRILL-4589
> URL: https://issues.apache.org/jira/browse/DRILL-4589
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> When Drill is used to query hundreds of thousands, or even millions of files 
> organized into multi-level directories, user typically will provide a 
> partition filter like  : dir0 = something and dir1 = something2 and .. .  
> For such queries, we saw the query planning time could be unacceptable long, 
> due to three main overheads: 1) to expand and get the list of files, 2) to 
> evaluate the partition filter, 3) to get the metadata, in the case of parquet 
> files for which metadata cache file is not available. 
> DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after 
> DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the 
> partition filter evaluation is applied to file level. In many cases, we saw 
> that the number of leaf subdirectories is significantly lower than that of 
> files. Since all the files under the same leaf subdirecctory share the same 
> directory metadata, we should apply the filter evaluation at the leaf 
> subdirectory. By doing that, we could reduce the cpu overhead to evaluate the 
> filter, and the memory overhead as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead


[ 
https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229302#comment-15229302
 ] 

ASF GitHub Bot commented on DRILL-4589:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/468#issuecomment-206611168
  
@amansinha100 , could you please review this PR? thanks!



> Reduce planning time for file system partition pruning by reducing filter 
> evaluation overhead
> -
>
> Key: DRILL-4589
> URL: https://issues.apache.org/jira/browse/DRILL-4589
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> When Drill is used to query hundreds of thousands, or even millions of files 
> organized into multi-level directories, user typically will provide a 
> partition filter like  : dir0 = something and dir1 = something2 and .. .  
> For such queries, we saw the query planning time could be unacceptable long, 
> due to three main overheads: 1) to expand and get the list of files, 2) to 
> evaluate the partition filter, 3) to get the metadata, in the case of parquet 
> files for which metadata cache file is not available. 
> DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after 
> DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the 
> partition filter evaluation is applied to file level. In many cases, we saw 
> that the number of leaf subdirectories is significantly lower than that of 
> files. Since all the files under the same leaf subdirecctory share the same 
> directory metadata, we should apply the filter evaluation at the leaf 
> subdirectory. By doing that, we could reduce the cpu overhead to evaluate the 
> filter, and the memory overhead as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead


[ 
https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229298#comment-15229298
 ] 

Jinfeng Ni commented on DRILL-4589:
---

I have a patch for this JIRA. Using the same dataset used in the comparison 
done in DRILL-2517(With 115k parquet files in total, it's organized in 25 
directories (1990, 1991, ... ), and each directory has four subdirectories (Q1, 
Q2, Q3, Q4).), here is the query planning time measured on a mac laptop. 

{code}
explain plan for select * from dfs.`/drill/testdata/tpch-sf10/lineitem115k` 
where dir0 = '1990' and dir1 = 'Q1';
{code}

Without the patch (on today's master branch:
{code}
1 row selected (8.084 seconds)
{code}

With the patch
{code}
1 row selected (4.306 seconds)
{code}

If the partition filter contains complex expression, then the improvement 
percentage is even higher. For this query, the improvement is 24.951 seconds 
vs. 4.393 seconds
{code}
explain plan for select * from dfs.`/drill/testdata/tpch-sf10/lineitem115k` 
where concat(substr(dir0, 1, 4), substr(dir1, 1, 2)) = '1990Q1';
{code} 




> Reduce planning time for file system partition pruning by reducing filter 
> evaluation overhead
> -
>
> Key: DRILL-4589
> URL: https://issues.apache.org/jira/browse/DRILL-4589
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> When Drill is used to query hundreds of thousands, or even millions of files 
> organized into multi-level directories, user typically will provide a 
> partition filter like  : dir0 = something and dir1 = something2 and .. .  
> For such queries, we saw the query planning time could be unacceptable long, 
> due to three main overheads: 1) to expand and get the list of files, 2) to 
> evaluate the partition filter, 3) to get the metadata, in the case of parquet 
> files for which metadata cache file is not available. 
> DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after 
> DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the 
> partition filter evaluation is applied to file level. In many cases, we saw 
> that the number of leaf subdirectories is significantly lower than that of 
> files. Since all the files under the same leaf subdirecctory share the same 
> directory metadata, we should apply the filter evaluation at the leaf 
> subdirectory. By doing that, we could reduce the cpu overhead to evaluate the 
> filter, and the memory overhead as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1170) YARN support for Drill

2016-04-06 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229251#comment-15229251
 ] 

Josh Elser commented on DRILL-1170:
---

"substantial time" is definitely hard to sign up for, but I'd be happy to try 
to help out where/when at all possible. :)

> YARN support for Drill
> --
>
> Key: DRILL-1170
> URL: https://issues.apache.org/jira/browse/DRILL-1170
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Neeraja
>Assignee: Paul Rogers
> Fix For: Future
>
>
> This is a tracking item to make Drill work with YARN.
> Below are few requirements/needs to consider.
> - Drill should run as an YARN based application, side by side with other YARN 
> enabled applications (on same nodes or different nodes). Both memory and CPU 
> resources of Drill should be controlled in this mechanism.
> - As an YARN enabled application, Drill resource consumption should be 
> adaptive to the load on the cluster. For ex: When there is no load on the 
> Drill , Drill should consume no resources on the cluster.  As the load on 
> Drill increases, resources permitting, usage should grow proportionally.
> - Low latency is a key requirement for Apache Drill along with support for 
> multiple users (concurrency in 100s-1000s). This should be supported when run 
> as YARN application as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4237) Skew in hash distribution


[ 
https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229160#comment-15229160
 ] 

ASF GitHub Bot commented on DRILL-4237:
---

Github user chunhui-shi commented on the pull request:

https://github.com/apache/drill/pull/430#issuecomment-206578634
  
@jacques-n 

The email response is not pushed here. So copy the sent email as below:

Thanks for pointing to openHFT. Yes, I went through multiple Java 
implementations including this one. The reason I decided to use smhasher as the 
source of truth was, the smhasher implementation includes comprehensive tests 
to cover the attributes for measuring goodness of a non-cryptographic hash 
function. And these attributes are subtle and could be found out to be a 
problem maybe only on certain lengths of input. So when I looked at these 
implementations, I checked what tests they have done first. And since there is 
no such tests(test multiple attributes and lengths) in these Java 
implementations to prove the hash functions are correct or good. So I decided 
to start from smhasher implementations and used the results generated from 
smhasher to verify any other(including drill's) implementation.


> Skew in hash distribution
> -
>
> Key: DRILL-4237
> URL: https://issues.apache.org/jira/browse/DRILL-4237
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Chunhui Shi
>
> Apparently, the fix in DRILL-4119 did not fully resolve the data skew issue.  
> It worked fine on the smaller sample of the data set but on another sample of 
> the same data set, it still produces skewed values - see below the hash 
> values which are all odd numbers. 
> {noformat}
> 0: jdbc:drill:zk=local> select columns[0], hash32(columns[0]) from `test.csv` 
> limit 10;
> +---+--+
> |  EXPR$0   |EXPR$1|
> +---+--+
> | f71aaddec3316ae18d43cb1467e88a41  | 1506011089   |
> | 3f3a13bb45618542b5ac9d9536704d3a  | 1105719049   |
> | 6935afd0c693c67bba482cedb7a2919b  | -18137557|
> | ca2a938d6d7e57bda40501578f98c2a8  | -1372666789  |
> | fab7f08402c8836563b0a5c94dbf0aec  | -1930778239  |
> | 9eb4620dcb68a84d17209da279236431  | -970026001   |
> | 16eed4a4e801b98550b4ff504242961e  | 356133757|
> | a46f7935fea578ce61d8dd45bfbc2b3d  | -94010449|
> | 7fdf5344536080c15deb2b5a2975a2b7  | -141361507   |
> | b82560a06e2e51b461c9fe134a8211bd  | -375376717   |
> +---+--+
> {noformat}
> This indicates an underlying issue with the XXHash64 java implementation, 
> which is Drill's implementation of the C version.  One of the key difference 
> as pointed out by [~jnadeau] was the use of unsigned int64 in the C version 
> compared to the Java version which uses (signed) long.  I created an XXHash 
> version using com.google.common.primitives.UnsignedLong.  However, 
> UnsignedLong does not have bit-wise operations that are needed for XXHash 
> such as rotateLeft(),  XOR etc.  One could write wrappers for these but at 
> this point, the question is: should we think of an alternative hash function 
> ? 
> The alternative approach could be the murmur hash for numeric data types that 
> we were using earlier and the Mahout version of hash function for string 
> types 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java#L28).
>   As a test, I reverted to this function and was getting good hash 
> distribution for the test data. 
> I could not find any performance comparisons of our perf tests (TPC-H or DS) 
> with the original and newer (XXHash) hash functions.  If performance is 
> comparable, should we revert to the original function ?  
> As an aside, I would like to remove the hash64 versions of the functions 
> since these are not used anywhere. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead


 [ 
https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-4589:
-

Assignee: Jinfeng Ni

> Reduce planning time for file system partition pruning by reducing filter 
> evaluation overhead
> -
>
> Key: DRILL-4589
> URL: https://issues.apache.org/jira/browse/DRILL-4589
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> When Drill is used to query hundreds of thousands, or even millions of files 
> organized into multi-level directories, user typically will provide a 
> partition filter like  : dir0 = something and dir1 = something2 and .. .  
> For such queries, we saw the query planning time could be unacceptable long, 
> due to three main overheads: 1) to expand and get the list of files, 2) to 
> evaluate the partition filter, 3) to get the metadata, in the case of parquet 
> files for which metadata cache file is not available. 
> DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after 
> DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the 
> partition filter evaluation is applied to file level. In many cases, we saw 
> that the number of leaf subdirectories is significantly lower than that of 
> files. Since all the files under the same leaf subdirecctory share the same 
> directory metadata, we should apply the filter evaluation at the leaf 
> subdirectory. By doing that, we could reduce the cpu overhead to evaluate the 
> filter, and the memory overhead as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1170) YARN support for Drill

2016-04-06 Thread Matt Pollock (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229120#comment-15229120
 ] 

Matt Pollock commented on DRILL-1170:
-

Thanks much. 

> YARN support for Drill
> --
>
> Key: DRILL-1170
> URL: https://issues.apache.org/jira/browse/DRILL-1170
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Neeraja
>Assignee: Paul Rogers
> Fix For: Future
>
>
> This is a tracking item to make Drill work with YARN.
> Below are few requirements/needs to consider.
> - Drill should run as an YARN based application, side by side with other YARN 
> enabled applications (on same nodes or different nodes). Both memory and CPU 
> resources of Drill should be controlled in this mechanism.
> - As an YARN enabled application, Drill resource consumption should be 
> adaptive to the load on the cluster. For ex: When there is no load on the 
> Drill , Drill should consume no resources on the cluster.  As the load on 
> Drill increases, resources permitting, usage should grow proportionally.
> - Low latency is a key requirement for Apache Drill along with support for 
> multiple users (concurrency in 100s-1000s). This should be supported when run 
> as YARN application as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4581) Various problems in the Drill startup scripts


 [ 
https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4581:
---
Description: 
Noticed the following in drillbit.sh:

1) Comment: DRILL_LOG_DIRWhere log files are stored.  PWD by default.
Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log

2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default.
Code: DRILL_PID_DIR=$DRILL_HOME

3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which 
checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is both 
unnecessary and prints a less informative message than the drill-config.sh 
check. Suggestion: Remove the JAVA_HOME check in drillbit.sh.

4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the 
JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? 
Recommended: export JAVA_HOME from drill-config.sh.

5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and set 
the default value. Drill-config.sh defaults to /var/log/drill, or if that 
fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not 
handle the case where that directory is not writable. Suggested: remove the 
check in drillbit.sh.

6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching 
sqlline.log, but does not delete that file, leaving a bogus, empty client log 
file on the drillbit server. Recommendation: use bash commands instead.

7) The implementation of the above check is a bit awkward. It has a fallback 
case with somewhat awkward logic. Clean this up.

8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if 
it does not exist. Recommended: decide on a single choice, implement it in 
drill-config.sh.

9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults 
it to $DRILL_HOME/conf. This can lead to subtle errors. If I use
drillbit.sh --config /misspelled/path
where I mistype the path, I won't get an error, I get the default config, which 
may not at all be what I want to run. Recommendation: if the value of 
DRILL_CONF_DRILL is passed into the script (as a variable or via --config), 
then that directory must exist. Else, use the default.

10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left 
over from the original Hadoop script that the Drill script was based upon. 
Recomendation: export only in the case that HADOOP_HOME is set for cygwin.

11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to 
stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a 
different message (to stdout) if the version is wrong. Recommendation: use the 
same format (and stderr) for both.

12) Similarly, other Java checks later in the script produce messages to 
stdout, not stderr.

13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies that 
it is executable. The script then throws away what we just found. Then, 
drill-bit.sh tries to recreate this information as:
JAVA=$JAVA_HOME/bin/java
This is wrong in two ways: 1) it ignores the actual java location and assumes 
it, and 2) it does not handle the java.exe case that drill-config.sh carefully 
worked out.
Recommendation: export JAVA from drill-config.sh and remove the above line from 
drillbit.sh.

14) drillbit.sh presumably takes extra arguments like this:
drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 
-Dvar2=value2 -Dvar3=value3
The -D bit allows the user to override config variables at the command line. 
But, the scripts don't use the values.
A) drill-config.sh consumes --config /my/conf/dir after consuming the leading 
arguments:
while [ $# -gt 1 ]; do
  if [ "--config" = "$1" ]; then
shift
confdir=$1
shift
DRILL_CONF_DIR=$confdir
  else
# Presume we are at end of options and break
break
  fi
done
B) drill-bit.sh will discard the var1:
startStopStatus=$1 <-- grabs "start"
shift
command=drillbit
shift   <-- Consumes -Dvar1=value1
C) Remaining values passed back into drillbit.sh:
args=$@
nohup $thiscmd internal_start $command $args
D) Second invocation discards -Dvar2=value2 as described above.
E) Remaining values are passed to runbit:
"$DRILL_HOME"/bin/runbit  $command "$@" start
F) Where they again pass though drill-config. (Allowing us to do:
drillbit.sh --config /first/conf --config /second/conf
which is asking for trouble)
G) And, the remaining arguments are simply not used:
exec $JAVA -Dlog.path=$DRILLBIT_LOG_PATH 
-Dlog.query.path=$DRILLBIT_QUERY_LOG_PATH $DRILL_ALL_JAVA_OPTS -cp $CP 
org.apache.drill.exec.server.Drillbit

15) The checking of command-line args in drillbit.sh is wrong:

# if no args specified, show usage
if [ $# -lt 1 ]; then
  echo $usage
  exit 1
fi
...
. "$bin"/drill-config.sh

But, note, that drill-config.sh handles:
drillbit.sh --config /conf/dir
Consuming

[jira] [Commented] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead


[ 
https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229107#comment-15229107
 ] 

Jinfeng Ni commented on DRILL-4589:
---

This is related to DRILL-3759, which targets for multi-phased partition 
pruning. Both of them aim to improve the efficiency of partition pruning in 
drill's query planner.

 

> Reduce planning time for file system partition pruning by reducing filter 
> evaluation overhead
> -
>
> Key: DRILL-4589
> URL: https://issues.apache.org/jira/browse/DRILL-4589
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>
> When Drill is used to query hundreds of thousands, or even millions of files 
> organized into multi-level directories, user typically will provide a 
> partition filter like  : dir0 = something and dir1 = something2 and .. .  
> For such queries, we saw the query planning time could be unacceptable long, 
> due to three main overheads: 1) to expand and get the list of files, 2) to 
> evaluate the partition filter, 3) to get the metadata, in the case of parquet 
> files for which metadata cache file is not available. 
> DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after 
> DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the 
> partition filter evaluation is applied to file level. In many cases, we saw 
> that the number of leaf subdirectories is significantly lower than that of 
> files. Since all the files under the same leaf subdirecctory share the same 
> directory metadata, we should apply the filter evaluation at the leaf 
> subdirectory. By doing that, we could reduce the cpu overhead to evaluate the 
> filter, and the memory overhead as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead

Jinfeng Ni created DRILL-4589:
-

 Summary: Reduce planning time for file system partition pruning by 
reducing filter evaluation overhead
 Key: DRILL-4589
 URL: https://issues.apache.org/jira/browse/DRILL-4589
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Jinfeng Ni


When Drill is used to query hundreds of thousands, or even millions of files 
organized into multi-level directories, user typically will provide a partition 
filter like  : dir0 = something and dir1 = something2 and .. .  

For such queries, we saw the query planning time could be unacceptable long, 
due to three main overheads: 1) to expand and get the list of files, 2) to 
evaluate the partition filter, 3) to get the metadata, in the case of parquet 
files for which metadata cache file is not available. 

DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after 
DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the 
partition filter evaluation is applied to file level. In many cases, we saw 
that the number of leaf subdirectories is significantly lower than that of 
files. Since all the files under the same leaf subdirecctory share the same 
directory metadata, we should apply the filter evaluation at the leaf 
subdirectory. By doing that, we could reduce the cpu overhead to evaluate the 
filter, and the memory overhead as well.








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1170) YARN support for Drill


[ 
https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229083#comment-15229083
 ] 

Paul Rogers commented on DRILL-1170:


Good progress is being made. Our tentative goal is the Drill 1.8 release for an 
initial integration. The goal is:

YARN support in Drill 1.8 enables admins to migrate their existing Drill 
cluster to run under YARN. The admin simply identifies the nodes on which Drill 
should run, identifies the required container sizes, and brings up the Drill 
cluster under YARN. YARN manages resource allocations for Drill alongside those 
of other YARN applications. Drill-on-YARN monitors Drill-bits and automatically 
restarts any that fail.

We'll have "experimental" support for starting/stopping Drill-bits. Starting 
bits is easy. Stopping is a bit of a challenge because we lack DRILL-2656.

> YARN support for Drill
> --
>
> Key: DRILL-1170
> URL: https://issues.apache.org/jira/browse/DRILL-1170
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Neeraja
>Assignee: Paul Rogers
> Fix For: Future
>
>
> This is a tracking item to make Drill work with YARN.
> Below are few requirements/needs to consider.
> - Drill should run as an YARN based application, side by side with other YARN 
> enabled applications (on same nodes or different nodes). Both memory and CPU 
> resources of Drill should be controlled in this mechanism.
> - As an YARN enabled application, Drill resource consumption should be 
> adaptive to the load on the cluster. For ex: When there is no load on the 
> Drill , Drill should consume no resources on the cluster.  As the load on 
> Drill increases, resources permitting, usage should grow proportionally.
> - Low latency is a key requirement for Apache Drill along with support for 
> multiple users (concurrency in 100s-1000s). This should be supported when run 
> as YARN application as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1170) YARN support for Drill


[ 
https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229076#comment-15229076
 ] 

Paul Rogers commented on DRILL-1170:


Worth a discussion. Is Slider still the "go to" option, or has effort shifted 
to Twill?

As it turns out, the actual YARN integration was not a big effort. Rather, most 
of the effort is around modifying Drill itself to play well with YARN, and 
implementing the management aspects unique to YARN.

> YARN support for Drill
> --
>
> Key: DRILL-1170
> URL: https://issues.apache.org/jira/browse/DRILL-1170
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Neeraja
>Assignee: Paul Rogers
> Fix For: Future
>
>
> This is a tracking item to make Drill work with YARN.
> Below are few requirements/needs to consider.
> - Drill should run as an YARN based application, side by side with other YARN 
> enabled applications (on same nodes or different nodes). Both memory and CPU 
> resources of Drill should be controlled in this mechanism.
> - As an YARN enabled application, Drill resource consumption should be 
> adaptive to the load on the cluster. For ex: When there is no load on the 
> Drill , Drill should consume no resources on the cluster.  As the load on 
> Drill increases, resources permitting, usage should grow proportionally.
> - Low latency is a key requirement for Apache Drill along with support for 
> multiple users (concurrency in 100s-1000s). This should be supported when run 
> as YARN application as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-4587) Document Drillbit launch options


[ 
https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902
 ] 

Paul Rogers edited comment on DRILL-4587 at 4/6/16 8:47 PM:


User-settable environment variables:

DRILL_HOME Drill home (defaults based on the location of the 
drillbit.sh script.)
DRILL_CONF_DIR  Alternate drill configuration directory that contains the 
drill-override.conf and drill-env.sh files. Default is $DRILL_HOME/conf
DRILL_LOG_DIR   Where log files are stored. Default is /var/log/drill if 
that exists, else $DRILL_HOME/log  
DRILL_PID_DIR   The  directory where Drill stores its Process ID (pid) 
file. $DRILL_HOME by default.
DRILL_IDENT_STRING  A string representing this instance of drillbit. $USER by 
default
DRILL_NICENESS  The scheduling priority for daemons. Defaults to 0.
DRILL_STOP_TIMEOUT  Used when stopping the Drill-bit. Grace period time, in 
seconds, after which the script forcibly kills the server if it has not 
stopped. Default 120 seconds.
JAVA_HOME  The java implementation to use. If not set, looks 
for java on the command pass and uses that location.
DRILL_CLASSPATHExtra Java CLASSPATH entries for custom code.
DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed 
to the system classpath.
HADOOP_HOMEHadoop home
HBASE_HOME HBase home
DRILL_JAVA_OPTS Optional JVM arguments such as system property overides 
used by both the drillbit and client.
DRILLBIT_JAVA_OPTS  Optional JVM arguments specifically for the drillbit. 
SERVER_GC_OPTS  Garbage collection options, including debug options. 
Provide special syntax. An option of the form:
-Xloggc:
Will replace  with the actual path to the Drill log directory.



was (Author: paul-rogers):
User-settable environment variables:

DRILL_HOME Drill home (defaults based on the location of the 
drillbit.sh script.)
DRILL_CONF_DIR  Alternate drill configuration directory that contains the 
drill-override.conf and drill-env.sh files. Default is $DRILL_HOME/conf
DRILL_LOG_DIR   Where log files are stored. Default is /var/log/drill if 
that exists, else $DRILL_HOME/log  
DRILL_PID_DIR   The  directory where Drill stores its Process ID (pid) 
file. $DRILL_HOME by default.
DRILL_IDENT_STRING  A string representing this instance of drillbit. $USER by 
default
DRILL_NICENESS  The scheduling priority for daemons. Defaults to 0.
DRILL_STOP_TIMEOUT  Used when stopping the Drill-bit. Grace period time, in 
seconds, after which the script forcibly kills the server if it has not 
stopped. Default 120 seconds.
JAVA_HOME  The java implementation to use. If not set, looks 
for java on the command pass and uses that location.
DRILL_CLASSPATHExtra Java CLASSPATH entries for custom code.
DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed 
to the system classpath.
HADOOP_HOMEHadoop home
HBASE_HOME HBase home
DRILL_JAVA_OPTS Optional JVM arguments such as system property overides 
used by both the drillbit and client.
DRILLBIT_JAVA_OPTS  Optional JVM arguments specifically for the drillbit. 
SERVER_GC_OPTS  todo


> Document Drillbit launch options
> 
>
> Key: DRILL-4587
> URL: https://issues.apache.org/jira/browse/DRILL-4587
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>
> Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
> production environments, or when managed by a tool such as Mesos or YARN, 
> customers have many options to customize the launch options. We should 
> document this information as below.
> The user can configure Drill launch in one of four ways, depending on their 
> needs.
> 1. Using the properties in drill-override.conf. Sets only startup and runtime 
> properties. All drillbits should use a copy of the file so that properties 
> set here apply to all drill bits and to client applications.
> 2. By setting environment variables prior to launching Drill. See the list 
> below. Use this to customize properties per drill-bit, such as for setting 
> port numbers. This option is useful when launching Drill from a tool such as 
> Mesos or YARN.
> 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the 
> list below. This script is intended to be unique to each node and is another 
> way to customize properties for this one node.
> 4. In Drill 1.7 and later, the administrator can set Drill configuration 
> options directly on the launch command as shown below. This option is also 
> useful when launching Drill from a

[jira] [Updated] (DRILL-4587) Document Drillbit launch options


 [ 
https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4587:
---
Description: 
Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
production environments, or when managed by a tool such as Mesos or YARN, 
customers have many options to customize the launch options. We should document 
this information as below.

The user can configure Drill launch in one of four ways, depending on their 
needs.

1. Using the properties in drill-override.conf. Sets only startup and runtime 
properties. All drillbits should use a copy of the file so that properties set 
here apply to all drill bits and to client applications.

2. By setting environment variables prior to launching Drill. See the list 
below. Use this to customize properties per drill-bit, such as for setting port 
numbers. This option is useful when launching Drill from a tool such as Mesos 
or YARN.

3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the 
list below. This script is intended to be unique to each node and is another 
way to customize properties for this one node.

4. In Drill 1.7 and later, the administrator can set Drill configuration 
options directly on the launch command as shown below. This option is also 
useful when launching Drill from a tool such as YARN or Mesos. Options are of 
the form:

$ drillbit.sh start -Dvariable=value

For example, to control the HTTP port:

$ drillbit.sh start -Ddrill.exec.http.port=8099 

Properties are of three types.

1. Launch-only properties: those that can be set only through environment 
variables (such as JAVA_HOME.)
2. Drill startup properties which can be set in the locations detailed below.
3. Drill runtime properties which are set in drill-override.conf also via SQL.

Drill startup propeties can be set in a number of locations. Those listed later 
take precedence over those listed earlier.

1. Drill-override.conf as identified by DRILL_CONF_DIR or its default.
2. Set in the environment using DRILL_JAVA_OPTS or DRILL_DRILLBIT_JAVA_OPTS.
3. Set in drill-env.sh using the above two variables.
4. Set on the drill.bit command line as explained above. (Drill 1.7 and later.)

You can see the actual set of properties used (from items 2-3 above) by using 
the "debug" command:

$ drillbit.sh debug


  was:
Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
production environments, or when managed by a tool such as Mesos or YARN, 
customers have many options to customize the launch options. We should document 
this information as below.

The user can configure Drill launch in one of four ways, depending on their 
needs.

1. Using the properties in drill-override.conf. Sets only startup and runtime 
properties. All drillbits should use a copy of the file so that properties set 
here apply to all drill bits and to client applications.

2. By setting environment variables prior to launching Drill. See the list 
below. Use this to customize properties per drill-bit, such as for setting port 
numbers. This option is useful when launching Drill from a tool such as Mesos 
or YARN.

3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the 
list below. This script is intended to be unique to each node and is another 
way to customize properties for this one node.

4. In Drill 1.7 and later, the administrator can set Drill configuration 
options directly on the launch command as shown below. This option is also 
useful when launching Drill from a tool such as YARN or Mesos. Options are of 
the form:

drillbit.sh start -Dvariable=value

For example, to control the HTTP port:

drillbit.sh start -Ddrill.exec.http.port=8099 


> Document Drillbit launch options
> 
>
> Key: DRILL-4587
> URL: https://issues.apache.org/jira/browse/DRILL-4587
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>
> Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
> production environments, or when managed by a tool such as Mesos or YARN, 
> customers have many options to customize the launch options. We should 
> document this information as below.
> The user can configure Drill launch in one of four ways, depending on their 
> needs.
> 1. Using the properties in drill-override.conf. Sets only startup and runtime 
> properties. All drillbits should use a copy of the file so that properties 
> set here apply to all drill bits and to client applications.
> 2. By setting environment variables prior to launching Drill. See the list 
> below. Use this to customize properties per drill-bit, such as for setting 
> port numbers. This option is useful when launching Drill from a tool such as 
> Mesos or YARN.
>

[jira] [Updated] (DRILL-4587) Document Drillbit launch options

[
https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paul Rogers updated DRILL-4587:
---
Description:
Drill provides the drillbit.sh script to launch Drill. When Drill is run in
production environments, or when managed by a tool such as Mesos or YARN,
customers have many options to customize the launch options. We should document
this information as below.

The user can configure Drill launch in one of four ways, depending on their
needs.

1. Using the properties in drill-override.conf. Sets only startup and runtime
properties. All drillbits should use a copy of the file so that properties set
here apply to all drill bits and to client applications.

2. By setting environment variables prior to launching Drill. See the list
below. Use this to customize properties per drill-bit, such as for setting port
numbers. This option is useful when launching Drill from a tool such as Mesos
or YARN.

3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the
list below. This script is intended to be unique to each node and is another
way to customize properties for this one node.

4. In Drill 1.7 and later, the administrator can set Drill configuration
options directly on the launch command as shown below. This option is also
useful when launching Drill from a tool such as YARN or Mesos. Options are of
the form:

$ drillbit.sh start -Dvariable=value

For example, to control the HTTP port:

$ drillbit.sh start -Ddrill.exec.http.port=8099

Properties are of three types.

1. Launch-only properties: those that can be set only through environment
variables (such as JAVA_HOME.)
2. Drill startup properties which can be set in the locations detailed below.
3. Drill runtime properties which are set in drill-override.conf also via SQL.

Drill startup propeties can be set in a number of locations. Those listed later
take precedence over those listed earlier.

1. Drill-override.conf as identified by DRILL_CONF_DIR or its default.
2. Set in the environment using DRILL_JAVA_OPTS or DRILL_DRILLBIT_JAVA_OPTS.
3. Set in drill-env.sh using the above two variables.
4. Set on the drill.bit command line as explained above. (Drill 1.7 and later.)

You can see the actual set of properties used (from items 2-3 above) by using
the "debug" command (Drill 1.7 or later):

$ drillbit.sh debug

was:
Drill provides the drillbit.sh script to launch Drill. When Drill is run in
production environments, or when managed by a tool such as Mesos or YARN,
customers have many options to customize the launch options. We should document
this information as below.

The user can configure Drill launch in one of four ways, depending on their
needs.

$ drillbit.sh start -Dvariable=value

For example, to control the HTTP port:

$ drillbit.sh start -Ddrill.exec.http.port=8099

Properties are of three types.

Drill startup propeties can be set in a number of locations. Those listed later
take precedence over those listed earlier.

You can see the actual set of properties used (from items 2-3 above) by using
the "debug" command:

$ drillbit.sh debug

> Document Drillbit launch options
>
>
> Key: DRILL-4587
> URL: https://issues.apache.org/jira/browse/DRILL-4587
> Project: Apache Drill
> Issue Type: Improvement
> Components: Documentation
>Reporter: Paul Rogers
>

[jira] [Assigned] (DRILL-4541) Make sure query planner does not generate operators with mixed convention trait


 [ 
https://issues.apache.org/jira/browse/DRILL-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-4541:
-

Assignee: Jinfeng Ni

> Make sure query planner does not generate operators with mixed convention 
> trait
> ---
>
> Key: DRILL-4541
> URL: https://issues.apache.org/jira/browse/DRILL-4541
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> Per the discussion [1] in the PR of DRILL-4531, we should fix the query 
> planner rules used in Drill planning, such that it will not generate Rels 
> with mixed convention trait.  For instance, a LogicalFilter should only have 
> child with NONE convention; it should not have child with LOGICAL convention. 
>   
> The mixed Rels will cause planner either hang (as reported in DRILL-4531 and 
> DRILL-3257), or do wasted work by firing rules against the mixed Rels.  
> I think the reason that we have such mixed rels is we have different kinds of 
> rules, used in a single Volcano planning phase.
> 1) Rule matchs base class Filter/Project, etc only.
> 2) Rule matches LogicalFilter/LogicalProject, etc
> 3) Rule matches DrillFilter/DrillProject, etc. 
> 3) Rule uses copy() method to generate a new Rel 
> 4) Rule uses RelFactory to generate a new Rel.
> 5) convent rule, which convert from Calcite logical (NONE/Enumerable) to 
> Drill logical (LOGICAL)
> For instance, ProjectMergeRule, which matches base Project, yet uses default 
> RelFactory, will match both LogicalProject and DrillProject, but produce 
> LogicalProject as outcome. That will cause the mixed rels.
> 2 things we may consider to fix this:
> 1) Separate the convent rules from the other transformation rules. Apply 
> convert rule first, then all the transformation rules match DrillLogical 
> only. 
> 2) Every rule that Drill uses, except for convert rules, should assert the 
> convention of input and output have the same convention.
> [1] https://github.com/apache/drill/pull/444



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-4587) Document Drillbit launch options


[ 
https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902
 ] 

Paul Rogers edited comment on DRILL-4587 at 4/6/16 8:38 PM:


User-settable environment variables:

DRILL_HOME Drill home (defaults based on the location of the 
drillbit.sh script.)
DRILL_CONF_DIR  Alternate drill configuration directory that contains the 
drill-override.conf and drill-env.sh files. Default is $DRILL_HOME/conf
DRILL_LOG_DIR   Where log files are stored. Default is /var/log/drill if 
that exists, else $DRILL_HOME/log  
DRILL_PID_DIR   The  directory where Drill stores its Process ID (pid) 
file. $DRILL_HOME by default.
DRILL_IDENT_STRING  A string representing this instance of drillbit. $USER by 
default
DRILL_NICENESS  The scheduling priority for daemons. Defaults to 0.
DRILL_STOP_TIMEOUT  Used when stopping the Drill-bit. Grace period time, in 
seconds, after which the script forcibly kills the server if it has not 
stopped. Default 120 seconds.
JAVA_HOME  The java implementation to use. If not set, looks 
for java on the command pass and uses that location.
DRILL_CLASSPATHExtra Java CLASSPATH entries for custom code.
DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed 
to the system classpath.
HADOOP_HOMEHadoop home
HBASE_HOME HBase home
DRILL_JAVA_OPTS Optional JVM arguments such as system property overides 
used by both the drillbit and client.
DRILLBIT_JAVA_OPTS  Optional JVM arguments specifically for the drillbit. 
SERVER_GC_OPTS  todo



was (Author: paul-rogers):
User-settable environment variables:

DRILL_CONF_DIR  Alternate drill conf dir. Default is $DRILL_HOME/conf.
DRILL_LOG_DIR   Where log files are stored. Default is /var/log/drill if 
that exists, else $DRILL_HOME/log  
DRILL_PID_DIR   The pid files are stored. /tmp by default.
DRILL_IDENT_STRING  A string representing this instance of drillbit. $USER by 
default
DRILL_NICENESS  The scheduling priority for daemons. Defaults to 0.
DRILL_STOP_TIMEOUT  Time, in seconds, after which we kill -9 the server if it 
has not stopped. Default 120 seconds.
DRILL_HOME Drill home (defaults based on this script's path.)
JAVA_HOME  The java implementation to use.
DRILL_CLASSPATHExtra Java CLASSPATH entries.
DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed 
to the system classpath.
HADOOP_HOMEHadoop home
HBASE_HOME HBase home
LOG_OPTS??
DRILL_JAVA_OPTS Optional JVM arguments such as system property overides 
used by both the drillbit and client.
DRILLBIT_JAVA_OPTS  Optional JVM arguments specifically for the drillbit. 
SERVER_GC_OPTS  todo


> Document Drillbit launch options
> 
>
> Key: DRILL-4587
> URL: https://issues.apache.org/jira/browse/DRILL-4587
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>
> Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
> production environments, or when managed by a tool such as Mesos or YARN, 
> customers have many options to customize the launch options. We should 
> document this information as below.
> The user can configure Drill launch in one of four ways, depending on their 
> needs.
> 1. Using the properties in drill-override.conf. Sets only startup and runtime 
> properties. All drillbits should use a copy of the file so that properties 
> set here apply to all drill bits and to client applications.
> 2. By setting environment variables prior to launching Drill. See the list 
> below. Use this to customize properties per drill-bit, such as for setting 
> port numbers. This option is useful when launching Drill from a tool such as 
> Mesos or YARN.
> 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the 
> list below. This script is intended to be unique to each node and is another 
> way to customize properties for this one node.
> 4. In Drill 1.7 and later, the administrator can set Drill configuration 
> options directly on the launch command as shown below. This option is also 
> useful when launching Drill from a tool such as YARN or Mesos. Options are of 
> the form:
> drillbit.sh start -Dvariable=value
> For example, to control the HTTP port:
> drillbit.sh start -Ddrill.exec.http.port=8099 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4587) Document Drillbit launch options

2016-04-06 Thread Bob Rumsby (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Rumsby updated DRILL-4587:
--
Assignee: Bridget Bevens

> Document Drillbit launch options
> 
>
> Key: DRILL-4587
> URL: https://issues.apache.org/jira/browse/DRILL-4587
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>
> Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
> production environments, or when managed by a tool such as Mesos or YARN, 
> customers have many options to customize the launch options. We should 
> document this information as below.
> The user can configure Drill launch in one of four ways, depending on their 
> needs.
> 1. Using the properties in drill-override.conf. Sets only startup and runtime 
> properties. All drillbits should use a copy of the file so that properties 
> set here apply to all drill bits and to client applications.
> 2. By setting environment variables prior to launching Drill. See the list 
> below. Use this to customize properties per drill-bit, such as for setting 
> port numbers. This option is useful when launching Drill from a tool such as 
> Mesos or YARN.
> 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the 
> list below. This script is intended to be unique to each node and is another 
> way to customize properties for this one node.
> 4. In Drill 1.7 and later, the administrator can set Drill configuration 
> options directly on the launch command as shown below. This option is also 
> useful when launching Drill from a tool such as YARN or Mesos. Options are of 
> the form:
> drillbit.sh start -Dvariable=value
> For example, to control the HTTP port:
> drillbit.sh start -Ddrill.exec.http.port=8099 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-4587) Document Drillbit launch options


[ 
https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902
 ] 

Paul Rogers edited comment on DRILL-4587 at 4/6/16 8:31 PM:


User-settable environment variables:

DRILL_CONF_DIR  Alternate drill conf dir. Default is
${DRILL_HOME}/conf.
DRILL_LOG_DIR   Where log files are stored. Default is /var/log/drill if 
that exists, else $DRILL_HOME/log  
DRILL_PID_DIR   The pid files are stored. /tmp by default.
DRILL_IDENT_STRING  A string representing this instance of drillbit. $USER by 
default
DRILL_NICENESS  The scheduling priority for daemons. Defaults to 0.
DRILL_STOP_TIMEOUT  Time, in seconds, after which we kill -9 the server if it 
has not stopped. Default 120 seconds.
DRILL_HOME Drill home (defaults based on this script's path.)
JAVA_HOME  The java implementation to use.
DRILL_CLASSPATHExtra Java CLASSPATH entries.
DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed 
to the system classpath.
HADOOP_HOMEHadoop home
HBASE_HOME HBase home
LOG_OPTS??
DRILL_JAVA_OPTS Optional JVM arguments such as system property overides 
used by both the drillbit and client.
DRILLBIT_JAVA_OPTS  Optional JVM arguments specifically for the drillbit. 
SERVER_GC_OPTS  todo



was (Author: paul-rogers):
User-settable environment variables:

DRILL_CONF_DIR  Alternate drill conf dir. Default is ${DRILL_HOME}/conf.
DRILL_LOG_DIR   Where log files are stored. Default is /var/log/drill if 
that exists, else $DRILL_HOME/log  
DRILL_PID_DIR   The pid files are stored. /tmp by default.
DRILL_IDENT_STRING  A string representing this instance of drillbit. $USER by 
default
DRILL_NICENESS  The scheduling priority for daemons. Defaults to 0.
DRILL_STOP_TIMEOUT  Time, in seconds, after which we kill -9 the server if it 
has not stopped. Default 120 seconds.
DRILL_HOME Drill home (defaults based on this script's path.)
JAVA_HOME  The java implementation to use.
DRILL_CLASSPATHExtra Java CLASSPATH entries.
DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed 
to the system classpath.
HADOOP_HOMEHadoop home
HBASE_HOME HBase home
LOG_OPTS??
DRILL_JAVA_OPTS Optional JVM arguments such as system property overides 
used by both the drillbit and client.
DRILLBIT_JAVA_OPTS  Optional JVM arguments specifically for the drillbit. 
SERVER_GC_OPTS  todo


> Document Drillbit launch options
> 
>
> Key: DRILL-4587
> URL: https://issues.apache.org/jira/browse/DRILL-4587
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>
> Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
> production environments, or when managed by a tool such as Mesos or YARN, 
> customers have many options to customize the launch options. We should 
> document this information as below.
> The user can configure Drill launch in one of four ways, depending on their 
> needs.
> 1. Using the properties in drill-override.conf. Sets only startup and runtime 
> properties. All drillbits should use a copy of the file so that properties 
> set here apply to all drill bits and to client applications.
> 2. By setting environment variables prior to launching Drill. See the list 
> below. Use this to customize properties per drill-bit, such as for setting 
> port numbers. This option is useful when launching Drill from a tool such as 
> Mesos or YARN.
> 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the 
> list below. This script is intended to be unique to each node and is another 
> way to customize properties for this one node.
> 4. In Drill 1.7 and later, the administrator can set Drill configuration 
> options directly on the launch command as shown below. This option is also 
> useful when launching Drill from a tool such as YARN or Mesos. Options are of 
> the form:
> drillbit.sh start -Dvariable=value
> For example, to control the HTTP port:
> drillbit.sh start -Ddrill.exec.http.port=8099 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-4587) Document Drillbit launch options


[ 
https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902
 ] 

Paul Rogers edited comment on DRILL-4587 at 4/6/16 8:31 PM:


User-settable environment variables:

DRILL_CONF_DIR  Alternate drill conf dir. Default is $DRILL_HOME/conf.
DRILL_LOG_DIR   Where log files are stored. Default is /var/log/drill if 
that exists, else $DRILL_HOME/log  
DRILL_PID_DIR   The pid files are stored. /tmp by default.
DRILL_IDENT_STRING  A string representing this instance of drillbit. $USER by 
default
DRILL_NICENESS  The scheduling priority for daemons. Defaults to 0.
DRILL_STOP_TIMEOUT  Time, in seconds, after which we kill -9 the server if it 
has not stopped. Default 120 seconds.
DRILL_HOME Drill home (defaults based on this script's path.)
JAVA_HOME  The java implementation to use.
DRILL_CLASSPATHExtra Java CLASSPATH entries.
DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed 
to the system classpath.
HADOOP_HOMEHadoop home
HBASE_HOME HBase home
LOG_OPTS??
DRILL_JAVA_OPTS Optional JVM arguments such as system property overides 
used by both the drillbit and client.
DRILLBIT_JAVA_OPTS  Optional JVM arguments specifically for the drillbit. 
SERVER_GC_OPTS  todo



was (Author: paul-rogers):
User-settable environment variables:

DRILL_CONF_DIR  Alternate drill conf dir. Default is
${DRILL_HOME}/conf.
DRILL_LOG_DIR   Where log files are stored. Default is /var/log/drill if 
that exists, else $DRILL_HOME/log  
DRILL_PID_DIR   The pid files are stored. /tmp by default.
DRILL_IDENT_STRING  A string representing this instance of drillbit. $USER by 
default
DRILL_NICENESS  The scheduling priority for daemons. Defaults to 0.
DRILL_STOP_TIMEOUT  Time, in seconds, after which we kill -9 the server if it 
has not stopped. Default 120 seconds.
DRILL_HOME Drill home (defaults based on this script's path.)
JAVA_HOME  The java implementation to use.
DRILL_CLASSPATHExtra Java CLASSPATH entries.
DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed 
to the system classpath.
HADOOP_HOMEHadoop home
HBASE_HOME HBase home
LOG_OPTS??
DRILL_JAVA_OPTS Optional JVM arguments such as system property overides 
used by both the drillbit and client.
DRILLBIT_JAVA_OPTS  Optional JVM arguments specifically for the drillbit. 
SERVER_GC_OPTS  todo


> Document Drillbit launch options
> 
>
> Key: DRILL-4587
> URL: https://issues.apache.org/jira/browse/DRILL-4587
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>
> Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
> production environments, or when managed by a tool such as Mesos or YARN, 
> customers have many options to customize the launch options. We should 
> document this information as below.
> The user can configure Drill launch in one of four ways, depending on their 
> needs.
> 1. Using the properties in drill-override.conf. Sets only startup and runtime 
> properties. All drillbits should use a copy of the file so that properties 
> set here apply to all drill bits and to client applications.
> 2. By setting environment variables prior to launching Drill. See the list 
> below. Use this to customize properties per drill-bit, such as for setting 
> port numbers. This option is useful when launching Drill from a tool such as 
> Mesos or YARN.
> 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the 
> list below. This script is intended to be unique to each node and is another 
> way to customize properties for this one node.
> 4. In Drill 1.7 and later, the administrator can set Drill configuration 
> options directly on the launch command as shown below. This option is also 
> useful when launching Drill from a tool such as YARN or Mesos. Options are of 
> the form:
> drillbit.sh start -Dvariable=value
> For example, to control the HTTP port:
> drillbit.sh start -Ddrill.exec.http.port=8099 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4587) Document Drillbit launch options


 [ 
https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4587:
---
Description: 
Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
production environments, or when managed by a tool such as Mesos or YARN, 
customers have many options to customize the launch options. We should document 
this information as below.

The user can configure Drill launch in one of four ways, depending on their 
needs.

1. Using the properties in drill-override.conf. Sets only startup and runtime 
properties. All drillbits should use a copy of the file so that properties set 
here apply to all drill bits and to client applications.

2. By setting environment variables prior to launching Drill. See the list 
below. Use this to customize properties per drill-bit, such as for setting port 
numbers. This option is useful when launching Drill from a tool such as Mesos 
or YARN.

3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the 
list below. This script is intended to be unique to each node and is another 
way to customize properties for this one node.

4. In Drill 1.7 and later, the administrator can set Drill configuration 
options directly on the launch command as shown below. This option is also 
useful when launching Drill from a tool such as YARN or Mesos. Options are of 
the form:

drillbit.sh start -Dvariable=value

For example, to control the HTTP port:

drillbit.sh start -Ddrill.exec.http.port=8099 

  was:
Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
production environments, or when managed by a tool such as Mesos or YARN, 
customers have many options to customize the launch options. We should document 
this information as below.

The user can configure Drill launch in one of two ways, depending on version.

$DRILL_HOME/conf/drill-env.sh allows the user to set environment variables that 
control the Drill launch. See the comment below for the list of these variables.

In Drill 1.7 and later, the administrator can set Drill configuration options 
directly on the launch command line in the form:

drillbit.sh start -Dvariable=value

For example, to control the control port:

drillbit.sh start -Ddrill.exec.http.port=8099 


> Document Drillbit launch options
> 
>
> Key: DRILL-4587
> URL: https://issues.apache.org/jira/browse/DRILL-4587
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>
> Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
> production environments, or when managed by a tool such as Mesos or YARN, 
> customers have many options to customize the launch options. We should 
> document this information as below.
> The user can configure Drill launch in one of four ways, depending on their 
> needs.
> 1. Using the properties in drill-override.conf. Sets only startup and runtime 
> properties. All drillbits should use a copy of the file so that properties 
> set here apply to all drill bits and to client applications.
> 2. By setting environment variables prior to launching Drill. See the list 
> below. Use this to customize properties per drill-bit, such as for setting 
> port numbers. This option is useful when launching Drill from a tool such as 
> Mesos or YARN.
> 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the 
> list below. This script is intended to be unique to each node and is another 
> way to customize properties for this one node.
> 4. In Drill 1.7 and later, the administrator can set Drill configuration 
> options directly on the launch command as shown below. This option is also 
> useful when launching Drill from a tool such as YARN or Mesos. Options are of 
> the form:
> drillbit.sh start -Dvariable=value
> For example, to control the HTTP port:
> drillbit.sh start -Ddrill.exec.http.port=8099 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4588) Enable JMXReporter to Expose Metrics


 [ 
https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam reassigned DRILL-4588:
--

Assignee: Sudheesh Katkam

> Enable JMXReporter to Expose Metrics
> 
>
> Key: DRILL-4588
> URL: https://issues.apache.org/jira/browse/DRILL-4588
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> There is a static initialization order issue that needs to be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4588) Enable JMXReporter to Expose Metrics

Sudheesh Katkam created DRILL-4588:
--

 Summary: Enable JMXReporter to Expose Metrics
 Key: DRILL-4588
 URL: https://issues.apache.org/jira/browse/DRILL-4588
 Project: Apache Drill
  Issue Type: Bug
Reporter: Sudheesh Katkam


There is a static initialization order issue that needs to be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-4587) Document Drillbit launch options


[ 
https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902
 ] 

Paul Rogers edited comment on DRILL-4587 at 4/6/16 8:24 PM:


User-settable environment variables:

DRILL_CONF_DIR  Alternate drill conf dir. Default is ${DRILL_HOME}/conf.
DRILL_LOG_DIR   Where log files are stored. Default is /var/log/drill if 
that exists, else $DRILL_HOME/log  
DRILL_PID_DIR   The pid files are stored. /tmp by default.
DRILL_IDENT_STRING  A string representing this instance of drillbit. $USER by 
default
DRILL_NICENESS  The scheduling priority for daemons. Defaults to 0.
DRILL_STOP_TIMEOUT  Time, in seconds, after which we kill -9 the server if it 
has not stopped. Default 120 seconds.
DRILL_HOME Drill home (defaults based on this script's path.)
JAVA_HOME  The java implementation to use.
DRILL_CLASSPATHExtra Java CLASSPATH entries.
DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed 
to the system classpath.
HADOOP_HOMEHadoop home
HBASE_HOME HBase home
LOG_OPTS??
DRILL_JAVA_OPTS Optional JVM arguments such as system property overides 
used by both the drillbit and client.
DRILLBIT_JAVA_OPTS  Optional JVM arguments specifically for the drillbit. 
SERVER_GC_OPTS  todo



was (Author: paul-rogers):
User-settable environment variables:

DRILL_CONF_DIR  Alternate drill conf dir. Default is ${DRILL_HOME}/conf.
DRILL_LOG_DIR   Where log files are stored. Default is /var/log/drill if 
that exists, else $DRILL_HOME/log  
DRILL_PID_DIR   The pid files are stored. /tmp by default.
DRILL_IDENT_STRING  A string representing this instance of drillbit. $USER by 
default
DRILL_NICENESS  The scheduling priority for daemons. Defaults to 0.
DRILL_STOP_TIMEOUT  Time, in seconds, after which we kill -9 the server if it 
has not stopped. Default 120 seconds.
DRILL_HOME Drill home (defaults based on this script's path.)
JAVA_HOME  The java implementation to use.
DRILL_CLASSPATHExtra Java CLASSPATH entries.
DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed 
to the system classpath.
HADOOP_HOMEHadoop home
HBASE_HOME HBase home
LOG_OPTS??
DRILL_JAVA_OPTS Optional JVM arguments such as system property overides 
used by both the drillbit and client.
DRILLBIT_JAVA_OPTS  Optional JVM arguments specifically for the drillbit. 
SERVER_GC_OPTS  todo


> Document Drillbit launch options
> 
>
> Key: DRILL-4587
> URL: https://issues.apache.org/jira/browse/DRILL-4587
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>
> Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
> production environments, or when managed by a tool such as Mesos or YARN, 
> customers have many options to customize the launch options. We should 
> document this information as below.
> The user can configure Drill launch in one of two ways, depending on version.
> $DRILL_HOME/conf/drill-env.sh allows the user to set environment variables 
> that control the Drill launch. See the comment below for the list of these 
> variables.
> In Drill 1.7 and later, the administrator can set Drill configuration options 
> directly on the launch command line in the form:
> drillbit.sh start -Dvariable=value
> For example, to control the control port:
> drillbit.sh start -Ddrill.exec.http.port=8099 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4587) Document Drillbit launch options


[ 
https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902
 ] 

Paul Rogers commented on DRILL-4587:


User-settable environment variables:

DRILL_CONF_DIR  Alternate drill conf dir. Default is ${DRILL_HOME}/conf.
DRILL_LOG_DIR   Where log files are stored. Default is /var/log/drill if 
that exists, else $DRILL_HOME/log  
DRILL_PID_DIR   The pid files are stored. /tmp by default.
DRILL_IDENT_STRING  A string representing this instance of drillbit. $USER by 
default
DRILL_NICENESS  The scheduling priority for daemons. Defaults to 0.
DRILL_STOP_TIMEOUT  Time, in seconds, after which we kill -9 the server if it 
has not stopped. Default 120 seconds.
DRILL_HOME Drill home (defaults based on this script's path.)
JAVA_HOME  The java implementation to use.
DRILL_CLASSPATHExtra Java CLASSPATH entries.
DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed 
to the system classpath.
HADOOP_HOMEHadoop home
HBASE_HOME HBase home
LOG_OPTS??
DRILL_JAVA_OPTS Optional JVM arguments such as system property overides 
used by both the drillbit and client.
DRILLBIT_JAVA_OPTS  Optional JVM arguments specifically for the drillbit. 
SERVER_GC_OPTS  todo


> Document Drillbit launch options
> 
>
> Key: DRILL-4587
> URL: https://issues.apache.org/jira/browse/DRILL-4587
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>
> Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
> production environments, or when managed by a tool such as Mesos or YARN, 
> customers have many options to customize the launch options. We should 
> document this information as below.
> The user can configure Drill launch in one of two ways, depending on version.
> $DRILL_HOME/conf/drill-env.sh allows the user to set environment variables 
> that control the Drill launch. See the comment below for the list of these 
> variables.
> In Drill 1.7 and later, the administrator can set Drill configuration options 
> directly on the launch command line in the form:
> drillbit.sh start -Dvariable=value
> For example, to control the control port:
> drillbit.sh start -Ddrill.exec.http.port=8099 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4587) Document Drillbit launch options

Paul Rogers created DRILL-4587:
--

 Summary: Document Drillbit launch options
 Key: DRILL-4587
 URL: https://issues.apache.org/jira/browse/DRILL-4587
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Reporter: Paul Rogers


Drill provides the drillbit.sh script to launch Drill. When Drill is run in 
production environments, or when managed by a tool such as Mesos or YARN, 
customers have many options to customize the launch options. We should document 
this information as below.

The user can configure Drill launch in one of two ways, depending on version.

$DRILL_HOME/conf/drill-env.sh allows the user to set environment variables that 
control the Drill launch. See the comment below for the list of these variables.

In Drill 1.7 and later, the administrator can set Drill configuration options 
directly on the launch command line in the form:

drillbit.sh start -Dvariable=value

For example, to control the control port:

drillbit.sh start -Ddrill.exec.http.port=8099 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins


[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228891#comment-15228891
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58761862
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -169,4 +176,223 @@ private static boolean containIdentity(List exps,
 }
 return true;
   }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, 
RexNode, List, List)}. Modified to rewrite
+   * the null equal join condition using IS NOT DISTINCT FROM operator.
+   *
+   * Splits out the equi-join components of a join condition, and returns
+   * what's left. For example, given the condition
+   *
+   * L.A = R.X AND L.B = L.C AND (L.D = 5 OR L.E =
+   * R.Y)
+   *
+   * returns
+   *
+   * 
+   * leftKeys = {A}
+   * rightKeys = {X}
+   * rest = L.B = L.C AND (L.D = 5 OR L.E = R.Y)
+   * 
+   *
+   * @param left  left input to join
+   * @param right right input to join
+   * @param condition join condition
+   * @param leftKeys  The ordinals of the fields from the left input which 
are
+   *  equi-join keys
+   * @param rightKeys The ordinals of the fields from the right input which
+   *  are equi-join keys
+   * @param joinOps List of equi-join operators (EQUALS or IS NOT DISTINCT 
FROM) used to join the left and right keys.
+   * @return remaining join filters that are not equijoins; may return a
+   * {@link RexLiteral} true, but never null
+   */
+  public static RexNode splitJoinCondition(
+  RelNode left,
+  RelNode right,
+  RexNode condition,
+  List leftKeys,
+  List rightKeys,
+  List joinOps) {
+final List nonEquiList = new ArrayList<>();
+
+splitJoinCondition(
+left.getRowType().getFieldCount(),
+condition,
+leftKeys,
+rightKeys,
+joinOps,
+nonEquiList);
+
+return RexUtil.composeConjunction(
+left.getCluster().getRexBuilder(), nonEquiList, false);
+  }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(int, RexNode, List, 
List, List)}. Modified to rewrite the null
+   * equal join condition using IS NOT DISTINCT FROM operator.
+   */
+  private static void splitJoinCondition(
--- End diff --

Rest looks good to me.  +1. 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4564) Document start-up properties hierarchy


 [ 
https://issues.apache.org/jira/browse/DRILL-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4564:
---
Summary: Document start-up properties hierarchy  (was: Add documentation 
detail regarding start-up properties hierarchy)

> Document start-up properties hierarchy
> --
>
> Key: DRILL-4564
> URL: https://issues.apache.org/jira/browse/DRILL-4564
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> We’ve been having a lively discussion about config options. We might want to 
> summarize the discussion in DRILL-4543. Current text:
> At the core of the file hierarchy is drill-default.conf. This file is 
> overridden by one or more drill-module.conf files, which are overridden by 
> the drill-override.conf file that you define.
> Possible revision:
> At the bottom of the hierarchy are the default files that Drill itself 
> provides. The first is drill-default.conf. This file is overridden by one or 
> more drill-module.conf files provided by Drill’s internal modules. These are 
> overridden by the drill-override.conf file that you define. Finally, you can 
> provide overrides on each Drill-bit using system properties of the form 
> -Dname=value passed on the command line:
> ./drillbit.sh start -Dname=value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins


[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228889#comment-15228889
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58761755
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -169,4 +176,223 @@ private static boolean containIdentity(List exps,
 }
 return true;
   }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, 
RexNode, List, List)}. Modified to rewrite
+   * the null equal join condition using IS NOT DISTINCT FROM operator.
+   *
+   * Splits out the equi-join components of a join condition, and returns
+   * what's left. For example, given the condition
+   *
+   * L.A = R.X AND L.B = L.C AND (L.D = 5 OR L.E =
+   * R.Y)
+   *
+   * returns
+   *
+   * 
+   * leftKeys = {A}
+   * rightKeys = {X}
+   * rest = L.B = L.C AND (L.D = 5 OR L.E = R.Y)
+   * 
+   *
+   * @param left  left input to join
+   * @param right right input to join
+   * @param condition join condition
+   * @param leftKeys  The ordinals of the fields from the left input which 
are
+   *  equi-join keys
+   * @param rightKeys The ordinals of the fields from the right input which
+   *  are equi-join keys
+   * @param joinOps List of equi-join operators (EQUALS or IS NOT DISTINCT 
FROM) used to join the left and right keys.
+   * @return remaining join filters that are not equijoins; may return a
+   * {@link RexLiteral} true, but never null
+   */
+  public static RexNode splitJoinCondition(
+  RelNode left,
+  RelNode right,
+  RexNode condition,
+  List leftKeys,
+  List rightKeys,
+  List joinOps) {
+final List nonEquiList = new ArrayList<>();
+
+splitJoinCondition(
+left.getRowType().getFieldCount(),
+condition,
+leftKeys,
+rightKeys,
+joinOps,
+nonEquiList);
+
+return RexUtil.composeConjunction(
+left.getCluster().getRexBuilder(), nonEquiList, false);
+  }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(int, RexNode, List, 
List, List)}. Modified to rewrite the null
+   * equal join condition using IS NOT DISTINCT FROM operator.
+   */
+  private static void splitJoinCondition(
--- End diff --

Can you confirm if this rewrite does *not* do the conversion if the join 
condition happens to involve columns coming from not just 2 tables but from 3  
tables ? e.g if the user accidentally gives: 

SELECT * FROM t1, t2, t3 WHERE t1.a = t2.a  OR (t1.b is null and t3.b is 
null)


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in


[ 
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228866#comment-15228866
 ] 

ASF GitHub Bot commented on DRILL-4577:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/461#discussion_r58759667
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -72,4 +80,76 @@ public String getTypeName() {
 return HiveStoragePluginConfig.NAME;
   }
 
+  @Override
+  public List> getTablesByNames(final 
List tableNames) {
+final String schemaName = getName();
+final List> tableNameToTable = 
Lists.newArrayList();
+List tables;
+// Retries once if the first call to fetch the metadata fails
+synchronized(mClient) {
+  final List tableNamesWithAuth = Lists.newArrayList();
+  for(String tableName : tableNames) {
+try {
+  if(mClient.tableExists(schemaName, tableName)) {
--- End diff --

If fetching partitions is causing the major delay, then we can load them 
lazily only if we need them in DrillHiveTable.


> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of 
> queries. Beside, a more efficient way is to fetch tables is to use 
> get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4581) Various problems in the Drill startup scripts


 [ 
https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4581:
---
Description: 
Noticed the following in drillbit.sh:

1) Comment: DRILL_LOG_DIRWhere log files are stored.  PWD by default.
Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log

2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default.
Code: DRILL_PID_DIR=$DRILL_HOME

3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which 
checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is both 
unnecessary and prints a less informative message than the drill-config.sh 
check. Suggestion: Remove the JAVA_HOME check in drillbit.sh.

4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the 
JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? 
Recommended: export JAVA_HOME from drill-config.sh.

5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and set 
the default value. Drill-config.sh defaults to /var/log/drill, or if that 
fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not 
handle the case where that directory is not writable. Suggested: remove the 
check in drillbit.sh.

6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching 
sqlline.log, but does not delete that file, leaving a bogus, empty client log 
file on the drillbit server. Recommendation: use bash commands instead.

7) The implementation of the above check is a bit awkward. It has a fallback 
case with somewhat awkward logic. Clean this up.

8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if 
it does not exist. Recommended: decide on a single choice, implement it in 
drill-config.sh.

9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults 
it to $DRILL_HOME/conf. This can lead to subtle errors. If I use
drillbit.sh --config /misspelled/path
where I mistype the path, I won't get an error, I get the default config, which 
may not at all be what I want to run. Recommendation: if the value of 
DRILL_CONF_DRILL is passed into the script (as a variable or via --config), 
then that directory must exist. Else, use the default.

10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left 
over from the original Hadoop script that the Drill script was based upon. 
Recomendation: export only in the case that HADOOP_HOME is set for cygwin.

11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to 
stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a 
different message (to stdout) if the version is wrong. Recommendation: use the 
same format (and stderr) for both.

12) Similarly, other Java checks later in the script produce messages to 
stdout, not stderr.

13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies that 
it is executable. The script then throws away what we just found. Then, 
drill-bit.sh tries to recreate this information as:
JAVA=$JAVA_HOME/bin/java
This is wrong in two ways: 1) it ignores the actual java location and assumes 
it, and 2) it does not handle the java.exe case that drill-config.sh carefully 
worked out.
Recommendation: export JAVA from drill-config.sh and remove the above line from 
drillbit.sh.

14) drillbit.sh presumably takes extra arguments like this:
drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 
-Dvar2=value2 -Dvar3=value3
The -D bit allows the user to override config variables at the command line. 
But, the scripts don't use the values.
A) drill-config.sh consumes --config /my/conf/dir after consuming the leading 
arguments:
while [ $# -gt 1 ]; do
  if [ "--config" = "$1" ]; then
shift
confdir=$1
shift
DRILL_CONF_DIR=$confdir
  else
# Presume we are at end of options and break
break
  fi
done
B) drill-bit.sh will discard the var1:
startStopStatus=$1 <-- grabs "start"
shift
command=drillbit
shift   <-- Consumes -Dvar1=value1
C) Remaining values passed back into drillbit.sh:
args=$@
nohup $thiscmd internal_start $command $args
D) Second invocation discards -Dvar2=value2 as described above.
E) Remaining values are passed to runbit:
"$DRILL_HOME"/bin/runbit  $command "$@" start
F) Where they again pass though drill-config. (Allowing us to do:
drillbit.sh --config /first/conf --config /second/conf
which is asking for trouble)
G) And, the remaining arguments are simply not used:
exec $JAVA -Dlog.path=$DRILLBIT_LOG_PATH 
-Dlog.query.path=$DRILLBIT_QUERY_LOG_PATH $DRILL_ALL_JAVA_OPTS -cp $CP 
org.apache.drill.exec.server.Drillbit

15) The checking of command-line args in drillbit.sh is wrong:

# if no args specified, show usage
if [ $# -lt 1 ]; then
  echo $usage
  exit 1
fi
...
. "$bin"/drill-config.sh

But, note, that drill-config.sh handles:
drillbit.sh --config /conf/dir
Consuming

[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in


[ 
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228850#comment-15228850
 ] 

ASF GitHub Bot commented on DRILL-4577:
---

Github user hsuanyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/461#discussion_r58758201
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -72,4 +80,76 @@ public String getTypeName() {
 return HiveStoragePluginConfig.NAME;
   }
 
+  @Override
+  public List> getTablesByNames(final 
List tableNames) {
+final String schemaName = getName();
+final List> tableNameToTable = 
Lists.newArrayList();
+List tables;
+// Retries once if the first call to fetch the metadata fails
+synchronized(mClient) {
+  final List tableNamesWithAuth = Lists.newArrayList();
+  for(String tableName : tableNames) {
+try {
+  if(mClient.tableExists(schemaName, tableName)) {
--- End diff --

Of course, eliminating the first one is important too. I am still 
investigating that.


> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of 
> queries. Beside, a more efficient way is to fetch tables is to use 
> get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in


[ 
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228846#comment-15228846
 ] 

ASF GitHub Bot commented on DRILL-4577:
---

Github user hsuanyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/461#discussion_r58758011
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -72,4 +80,76 @@ public String getTypeName() {
 return HiveStoragePluginConfig.NAME;
   }
 
+  @Override
+  public List> getTablesByNames(final 
List tableNames) {
+final String schemaName = getName();
+final List> tableNameToTable = 
Lists.newArrayList();
+List tables;
+// Retries once if the first call to fetch the metadata fails
+synchronized(mClient) {
+  final List tableNamesWithAuth = Lists.newArrayList();
+  for(String tableName : tableNames) {
+try {
+  if(mClient.tableExists(schemaName, tableName)) {
--- End diff --

There are two parts which makes the query slow. 
One follows from your point; The other is fetching partitions which turned 
out not used. [1]

[1] 
https://github.com/apache/drill/blob/245da9790813569c5da9404e0fc5e45cc88e22bb/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L236


> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of 
> queries. Beside, a more efficient way is to fetch tables is to use 
> get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4586) Create CLIENT ErrorType

Sudheesh Katkam created DRILL-4586:
--

 Summary: Create CLIENT ErrorType
 Key: DRILL-4586
 URL: https://issues.apache.org/jira/browse/DRILL-4586
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Sudheesh Katkam


To display client errors with nice messages, we use "system error". However 
system error which is not meant to be used when we want to display a proper 
error message. System errors are meant for unexpected errors that don't have a 
"nice" error message yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4581) Various problems in the Drill startup scripts


 [ 
https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4581:
---
Description: 
Noticed the following in drillbit.sh:

1) Comment: DRILL_LOG_DIRWhere log files are stored.  PWD by default.
Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log

2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default.
Code: DRILL_PID_DIR=$DRILL_HOME

3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which 
checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is both 
unnecessary and prints a less informative message than the drill-config.sh 
check. Suggestion: Remove the JAVA_HOME check in drillbit.sh.

4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the 
JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? 
Recommended: export JAVA_HOME from drill-config.sh.

5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and set 
the default value. Drill-config.sh defaults to /var/log/drill, or if that 
fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not 
handle the case where that directory is not writable. Suggested: remove the 
check in drillbit.sh.

6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching 
sqlline.log, but does not delete that file, leaving a bogus, empty client log 
file on the drillbit server. Recommendation: use bash commands instead.

7) The implementation of the above check is a bit awkward. It has a fallback 
case with somewhat awkward logic. Clean this up.

8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if 
it does not exist. Recommended: decide on a single choice, implement it in 
drill-config.sh.

9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults 
it to $DRILL_HOME/conf. This can lead to subtle errors. If I use
drillbit.sh --config /misspelled/path
where I mistype the path, I won't get an error, I get the default config, which 
may not at all be what I want to run. Recommendation: if the value of 
DRILL_CONF_DRILL is passed into the script (as a variable or via --config), 
then that directory must exist. Else, use the default.

10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left 
over from the original Hadoop script that the Drill script was based upon. 
Recomendation: export only in the case that HADOOP_HOME is set for cygwin.

11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to 
stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a 
different message (to stdout) if the version is wrong. Recommendation: use the 
same format (and stderr) for both.

12) Similarly, other Java checks later in the script produce messages to 
stdout, not stderr.

13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies that 
it is executable. The script then throws away what we just found. Then, 
drill-bit.sh tries to recreate this information as:
JAVA=$JAVA_HOME/bin/java
This is wrong in two ways: 1) it ignores the actual java location and assumes 
it, and 2) it does not handle the java.exe case that drill-config.sh carefully 
worked out.
Recommendation: export JAVA from drill-config.sh and remove the above line from 
drillbit.sh.

14) drillbit.sh presumably takes extra arguments like this:
drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 
-Dvar2=value2 -Dvar3=value3
The -D bit allows the user to override config variables at the command line. 
But, the scripts don't use the values.
A) drill-config.sh consumes --config /my/conf/dir after consuming the leading 
arguments:
while [ $# -gt 1 ]; do
  if [ "--config" = "$1" ]; then
shift
confdir=$1
shift
DRILL_CONF_DIR=$confdir
  else
# Presume we are at end of options and break
break
  fi
done
B) drill-bit.sh will discard the var1:
startStopStatus=$1 <-- grabs "start"
shift
command=drillbit
shift   <-- Consumes -Dvar1=value1
C) Remaining values passed back into drillbit.sh:
args=$@
nohup $thiscmd internal_start $command $args
D) Second invocation discards -Dvar2=value2 as described above.
E) Remaining values are passed to runbit:
"$DRILL_HOME"/bin/runbit  $command "$@" start
F) Where they again pass though drill-config. (Allowing us to do:
drillbit.sh --config /first/conf --config /second/conf
which is asking for trouble)
G) And, the remaining arguments are simply not used:
exec $JAVA -Dlog.path=$DRILLBIT_LOG_PATH 
-Dlog.query.path=$DRILLBIT_QUERY_LOG_PATH $DRILL_ALL_JAVA_OPTS -cp $CP 
org.apache.drill.exec.server.Drillbit

15) The checking of command-line args in drillbit.sh is wrong:

# if no args specified, show usage
if [ $# -lt 1 ]; then
  echo $usage
  exit 1
fi
...
. "$bin"/drill-config.sh

But, note, that drill-config.sh handles:
drillbit.sh --config /conf/dir
Consuming

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins


[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228838#comment-15228838
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58757405
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java ---
@@ -407,11 +342,94 @@ public void 
testMergeLOJNullableBothInputsOrderedDescNullsLastVsAscNullsLast() t
 + " ORDER BY 1 ASC NULLS LAST  ) t2 "
 + "USING ( key )",
 TEST_RES_PATH, TEST_RES_PATH);
-final int expectedRecordCount = 6;
+testHelper(query, 6, false, true);
+  }
+
+  @Test
+  public void withDistinctFromJoinConditionHashJoin() throws Exception {
+final String query = "SELECT * FROM " +
+"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+"cp.`jsonInput/nullableOrdered2.json` t2 " +
+"ON t1.key IS NOT DISTINCT FROM t2.key AND t1.data is NOT 
null";
+nullEqualJoinHelper(query);
+  }
+
+  @Test
+  public void withDistinctFromJoinConditionMergeJoin() throws Exception {
+try {
+  test("alter session set `planner.enable_hashjoin` = false");
+  final String query = "SELECT * FROM " +
+  "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+  "cp.`jsonInput/nullableOrdered2.json` t2 " +
+  "ON t1.key IS NOT DISTINCT FROM t2.key";
+  nullEqualJoinHelper(query);
+} finally {
+  test("alter session set `planner.enable_hashjoin` = true");
+}
+  }
+
+  @Test
+  public void withNullEqualHashJoin() throws Exception {
+final String query = "SELECT * FROM " +
+"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+"cp.`jsonInput/nullableOrdered2.json` t2 " +
+"ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)";
+nullEqualJoinHelper(query);
+  }
 
-enableJoin(false, true);
-final int actualRecordCount = testSql(query);
-assertEquals("Number of output rows", expectedRecordCount, 
actualRecordCount);
+  @Test
+  public void withNullEqualMergeJoin() throws Exception {
+try {
+  test("alter session set `planner.enable_hashjoin` = false");
+  final String query = "SELECT * FROM " +
+  "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+  "cp.`jsonInput/nullableOrdered2.json` t2 " +
+  "ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)";
+  nullEqualJoinHelper(query);
+} finally {
+  test("alter session set `planner.enable_hashjoin` = true");
+}
+  }
+
+  public void nullEqualJoinHelper(final String query) throws Exception {
+testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.baselineColumns("key", "data", "data0", "key0")
+.baselineValues(null, "L_null_1", "R_null_1", null)
+.baselineValues(null, "L_null_2", "R_null_1", null)
+.baselineValues("A", "L_A_1", "R_A_1", "A")
+.baselineValues("A", "L_A_2", "R_A_1", "A")
+.baselineValues(null, "L_null_1", "R_null_2", null)
+.baselineValues(null, "L_null_2", "R_null_2", null)
+.baselineValues(null, "L_null_1", "R_null_3", null)
+.baselineValues(null, "L_null_2", "R_null_3", null)
+.go();
   }
 
+  @Test
+  public void withNullEqualAdditionFilter() throws Exception {
--- End diff --

Sure. I will update the patch with new tests.


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
>

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins


[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228833#comment-15228833
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58757060
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java ---
@@ -407,11 +342,94 @@ public void 
testMergeLOJNullableBothInputsOrderedDescNullsLastVsAscNullsLast() t
 + " ORDER BY 1 ASC NULLS LAST  ) t2 "
 + "USING ( key )",
 TEST_RES_PATH, TEST_RES_PATH);
-final int expectedRecordCount = 6;
+testHelper(query, 6, false, true);
+  }
+
+  @Test
+  public void withDistinctFromJoinConditionHashJoin() throws Exception {
+final String query = "SELECT * FROM " +
+"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+"cp.`jsonInput/nullableOrdered2.json` t2 " +
+"ON t1.key IS NOT DISTINCT FROM t2.key AND t1.data is NOT 
null";
+nullEqualJoinHelper(query);
+  }
+
+  @Test
+  public void withDistinctFromJoinConditionMergeJoin() throws Exception {
+try {
+  test("alter session set `planner.enable_hashjoin` = false");
+  final String query = "SELECT * FROM " +
+  "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+  "cp.`jsonInput/nullableOrdered2.json` t2 " +
+  "ON t1.key IS NOT DISTINCT FROM t2.key";
+  nullEqualJoinHelper(query);
+} finally {
+  test("alter session set `planner.enable_hashjoin` = true");
+}
+  }
+
+  @Test
+  public void withNullEqualHashJoin() throws Exception {
+final String query = "SELECT * FROM " +
+"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+"cp.`jsonInput/nullableOrdered2.json` t2 " +
+"ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)";
+nullEqualJoinHelper(query);
+  }
 
-enableJoin(false, true);
-final int actualRecordCount = testSql(query);
-assertEquals("Number of output rows", expectedRecordCount, 
actualRecordCount);
+  @Test
+  public void withNullEqualMergeJoin() throws Exception {
+try {
+  test("alter session set `planner.enable_hashjoin` = false");
+  final String query = "SELECT * FROM " +
+  "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+  "cp.`jsonInput/nullableOrdered2.json` t2 " +
+  "ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)";
+  nullEqualJoinHelper(query);
+} finally {
+  test("alter session set `planner.enable_hashjoin` = true");
+}
+  }
+
+  public void nullEqualJoinHelper(final String query) throws Exception {
+testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.baselineColumns("key", "data", "data0", "key0")
+.baselineValues(null, "L_null_1", "R_null_1", null)
+.baselineValues(null, "L_null_2", "R_null_1", null)
+.baselineValues("A", "L_A_1", "R_A_1", "A")
+.baselineValues("A", "L_A_2", "R_A_1", "A")
+.baselineValues(null, "L_null_1", "R_null_2", null)
+.baselineValues(null, "L_null_2", "R_null_2", null)
+.baselineValues(null, "L_null_1", "R_null_3", null)
+.baselineValues(null, "L_null_2", "R_null_3", null)
+.go();
   }
 
+  @Test
+  public void withNullEqualAdditionFilter() throws Exception {
--- End diff --

Could you also do similar test with the join condition in the WHERE clause 
instead of ON clause ?  i.e something like:  SELECT * FROM t1, t2 WHERE t1.a = 
t2.a OR (t1.a is null and t2.a is null)
For such cases, Calcite filter pushdown into join needs to be applied 
first. 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city`

[jira] [Commented] (DRILL-4575) alias not working on field.

2016-04-06 Thread Hugo Bellomusto (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228817#comment-15228817
 ] 

Hugo Bellomusto commented on DRILL-4575:


It sounds different, in DRILL-4572 error happens when using functions.
Here, I use a function to make it work.


> alias not working on field.
> ---
>
> Key: DRILL-4575
> URL: https://issues.apache.org/jira/browse/DRILL-4575
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: Apache drill 1.6.0
> java 1.7.0_40
>Reporter: Hugo Bellomusto
>
> {code:sql}
> create table dfs.tmp.a_field as
> select 'hello' field  from (VALUES(1));
> select field   my_field   from dfs.tmp.a_field;
> {code}
> The result is:
> ||field||
> |hello|
> When should be:
> ||my_field||
> |hello|
> {noformat:title=physical plan}
> 00-00Screen : rowType = RecordType(ANY field): rowcount = 1.0, cumulative 
> cost = {1.1 rows, 1.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1635
> 00-01  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=hdfs://10.70.168.69:8020/tmp/a_field]], 
> selectionRoot=hdfs://10.70.168.69:8020/tmp/a_field, numFiles=1, 
> usedMetadataFile=false, columns=[`field`]]]) : rowType = RecordType(ANY 
> field): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 1634
> {noformat}
> But, this works well:
> {code:sql}
> select concat(field, ' world')  my_field from dfs.tmp.a_field;
> {code}
> returns:
> ||my_field||
> |hello world|
> Additional info:
> {code:sql}
> select * from sys.options where name like '%parquet%' or string_val like 
> '%parquet%';
> {code}
> ||name||string_val|
> |store.format|parquet|
> |store.parquet.block-size| |
> |store.parquet.compression|snappy|
> |store.parquet.dictionary.page-size| |
> |store.parquet.enable_dictionary_encoding| |
> |store.parquet.page-size| |
> |store.parquet.use_new_reader| |
> |store.parquet.vector_fill_check_threshold| |
> |store.parquet.vector_fill_threshold| |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1170) YARN support for Drill

2016-04-06 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228797#comment-15228797
 ] 

Jacques Nadeau commented on DRILL-1170:
---

Hey Paul & Billie, if the Slider community co-implemented this with the Drill 
folk, it would probably allow Slider to support more use cases and bring us to 
a shared approach rather than two separate codebases. Do you think that anyone 
from the Slider community would be able to spend substantial time against this 
to address the Drill needs? 

> YARN support for Drill
> --
>
> Key: DRILL-1170
> URL: https://issues.apache.org/jira/browse/DRILL-1170
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Neeraja
>Assignee: Paul Rogers
> Fix For: Future
>
>
> This is a tracking item to make Drill work with YARN.
> Below are few requirements/needs to consider.
> - Drill should run as an YARN based application, side by side with other YARN 
> enabled applications (on same nodes or different nodes). Both memory and CPU 
> resources of Drill should be controlled in this mechanism.
> - As an YARN enabled application, Drill resource consumption should be 
> adaptive to the load on the cluster. For ex: When there is no load on the 
> Drill , Drill should consume no resources on the cluster.  As the load on 
> Drill increases, resources permitting, usage should grow proportionally.
> - Low latency is a key requirement for Apache Drill along with support for 
> multiple users (concurrency in 100s-1000s). This should be supported when run 
> as YARN application as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner


[ 
https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228790#comment-15228790
 ] 

ASF GitHub Bot commented on DRILL-4576:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/466#discussion_r58754467
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import java.util.Collection;
+
+import org.apache.calcite.plan.RelOptPlanner;
+
+/**
+ * A callback that StoragePlugins can initialize to allow further 
configuration
+ * of the Planner at initialization time. Examples could be to allow 
adding lattices,
+ * materializations or additional traits to the planner that will be used 
in
+ * planning.
+ */
+public abstract class PlannerCallback {
+
+  /**
+   * Method that will be called before a planner is used to further 
configure the planner.
+   * @param planner The planner to be configured.
+   */
+  public abstract void initializePlanner(RelOptPlanner planner);
+
+
+  public static PlannerCallback merge(Collection 
callbacks){
+return new PlannerCallbackCollection(callbacks);
+  }
+
+  private static class PlannerCallbackCollection extends PlannerCallback{
+private Collection callbacks;
+
+private PlannerCallbackCollection(Collection 
callbacks){
+  this.callbacks = callbacks;
--- End diff --

should a immutable copy be used instead of the caller's collection?


> Add StoragePlugin API to register materialization into planner
> --
>
> Key: DRILL-4576
> URL: https://issues.apache.org/jira/browse/DRILL-4576
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Laurent Goujon
>Assignee: Jacques Nadeau
>
> There's no currently a good way to register materializations into Drill 
> planner. Calcite's MaterializationService.instance() would be the way to go, 
> but the registration happens in 
> {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is 
> not called by Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4132) Ability to submit simple type of physical plan directly to EndPoint DrillBit for execution


[ 
https://issues.apache.org/jira/browse/DRILL-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228789#comment-15228789
 ] 

ASF GitHub Bot commented on DRILL-4132:
---

Github user yufeldman commented on a diff in the pull request:

https://github.com/apache/drill/pull/368#discussion_r58754414
  
--- Diff: 
protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java ---
@@ -133,6 +133,10 @@ private RpcChannel(int index, int value) {
  * PHYSICAL = 3;
  */
 PHYSICAL(2, 3),
+/**
+ * EXECUTIONAL = 4;
+ */
+EXECUTIONAL(3, 4),
--- End diff --

sure again :). 


> Ability to submit simple type of physical plan directly to EndPoint DrillBit 
> for execution
> --
>
> Key: DRILL-4132
> URL: https://issues.apache.org/jira/browse/DRILL-4132
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Flow, Execution - RPC, Query Planning & 
> Optimization
>Reporter: Yuliya Feldman
>Assignee: Yuliya Feldman
>
> Today Drill Query execution is optimistic and stateful (at least due to data 
> exchanges) - if any of the stages of query execution fails whole query fails. 
> If query is just simple scan, filter push down and project where no data 
> exchange happens between DrillBits there is no need to fail whole query when 
> one DrillBit fails, as minor fragments running on that DrillBit can be rerun 
> on the other DrillBit. There are probably multiple ways to achieve this. This 
> JIRA is to open discussion on: 
> 1. agreement that we need to support above use case 
> 2. means of achieving it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4132) Ability to submit simple type of physical plan directly to EndPoint DrillBit for execution


[ 
https://issues.apache.org/jira/browse/DRILL-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228779#comment-15228779
 ] 

ASF GitHub Bot commented on DRILL-4132:
---

Github user yufeldman commented on a diff in the pull request:

https://github.com/apache/drill/pull/368#discussion_r58753681
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizerMultiPlans.java
 ---
@@ -0,0 +1,222 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.fragment;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.util.DrillStringUtils;
+import org.apache.drill.exec.ops.QueryContext;
+import org.apache.drill.exec.physical.base.Exchange;
+import org.apache.drill.exec.physical.base.FragmentRoot;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.PhysicalPlanReader;
+import 
org.apache.drill.exec.planner.fragment.Materializer.IndexedFragmentNode;
+import org.apache.drill.exec.proto.BitControl.PlanFragment;
+import org.apache.drill.exec.proto.BitControl.QueryContextInformation;
+import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
+import org.apache.drill.exec.proto.ExecProtos.FragmentHandle;
+import org.apache.drill.exec.proto.UserBitShared.QueryId;
+import org.apache.drill.exec.rpc.user.UserSession;
+import org.apache.drill.exec.server.options.OptionList;
+import org.apache.drill.exec.work.QueryWorkUnit;
+import org.apache.drill.exec.work.foreman.ForemanSetupException;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+/**
+ * SimpleParallelizerMultiPlans class is an extension to SimpleParallelizer
+ * to help with getting PlanFragments for split plan.
+ * Split plan is essentially ability to create multiple physical plans 
from a single logical plan
+ * to be able to run them separately.
+ * Moving functionality specific to splitting the plan to this class
+ * allows not to pollute parent class with non-authentic functionality
+ *
+ */
+public class SimpleParallelizerMultiPlans extends SimpleParallelizer {
+
+  public SimpleParallelizerMultiPlans(QueryContext context) {
+super(context);
+  }
+
+  /**
+   * Create multiple physical plans from original query planning, it will 
allow execute them eventually independently
+   * @param options
+   * @param foremanNode
+   * @param queryId
+   * @param activeEndpoints
+   * @param reader
+   * @param rootFragment
+   * @param session
+   * @param queryContextInfo
+   * @return
+   * @throws ExecutionSetupException
+   */
+  public List getSplitFragments(OptionList options, 
DrillbitEndpoint foremanNode, QueryId queryId,
+  Collection activeEndpoints, PhysicalPlanReader 
reader, Fragment rootFragment,
+  UserSession session, QueryContextInformation queryContextInfo) 
throws ExecutionSetupException {
+
+final PlanningSet planningSet = getFragmentsHelper(activeEndpoints, 
rootFragment);
+
+return generateWorkUnits(
+options, foremanNode, queryId, reader, rootFragment, planningSet, 
session, queryContextInfo);
+  }
+
+  /**
+   * Split plan into multiple plans based on parallelization
+   * Ideally it is applicable only to plans with two major fragments: 
Screen and UnionExchange
+   * But there could be cases where we can remove even multiple exchanges 
like in case of "order by"
+   * End goal is to get single major fragment: Screen with chain that ends 
up with a single minor fragment
+   * from Leaf Exchange. This way each plan can run independently without 
any exchange involvement
+

[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner


[ 
https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228777#comment-15228777
 ] 

ASF GitHub Bot commented on DRILL-4576:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/466#discussion_r58753568
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import java.util.Collection;
+
+import org.apache.calcite.plan.RelOptPlanner;
+
+/**
+ * A callback that StoragePlugins can initialize to allow further 
configuration
+ * of the Planner at initialization time. Examples could be to allow 
adding lattices,
+ * materializations or additional traits to the planner that will be used 
in
+ * planning.
+ */
+public abstract class PlannerCallback {
+
+  /**
+   * Method that will be called before a planner is used to further 
configure the planner.
+   * @param planner The planner to be configured.
+   */
+  public abstract void initializePlanner(RelOptPlanner planner);
+
+
+  public static PlannerCallback merge(Collection 
callbacks){
+return new PlannerCallbackCollection(callbacks);
+  }
+
+  private static class PlannerCallbackCollection extends PlannerCallback{
+private Collection callbacks;
--- End diff --

final


> Add StoragePlugin API to register materialization into planner
> --
>
> Key: DRILL-4576
> URL: https://issues.apache.org/jira/browse/DRILL-4576
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Laurent Goujon
>Assignee: Jacques Nadeau
>
> There's no currently a good way to register materializations into Drill 
> planner. Calcite's MaterializationService.instance() would be the way to go, 
> but the registration happens in 
> {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is 
> not called by Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in


[ 
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228774#comment-15228774
 ] 

ASF GitHub Bot commented on DRILL-4577:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/461#discussion_r58753354
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -72,4 +80,76 @@ public String getTypeName() {
 return HiveStoragePluginConfig.NAME;
   }
 
+  @Override
+  public List> getTablesByNames(final 
List tableNames) {
+final String schemaName = getName();
+final List> tableNameToTable = 
Lists.newArrayList();
+List tables;
+// Retries once if the first call to fetch the metadata fails
+synchronized(mClient) {
+  final List tableNamesWithAuth = Lists.newArrayList();
+  for(String tableName : tableNames) {
+try {
+  if(mClient.tableExists(schemaName, tableName)) {
--- End diff --

Here you are making a RPC call for every table. I thought for perf reasons 
we wanted to avoid the RPC call per table and instead use 
```getTableObjectsByName``` to get all tables data in one RPC call. How does 
this patch improve the perf?


> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of 
> queries. Beside, a more efficient way is to fetch tables is to use 
> get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1170) YARN support for Drill

2016-04-06 Thread Matt Pollock (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228760#comment-15228760
 ] 

Matt Pollock commented on DRILL-1170:
-

Any progress update? My organization won't support use of Drill until this is 
done.

> YARN support for Drill
> --
>
> Key: DRILL-1170
> URL: https://issues.apache.org/jira/browse/DRILL-1170
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Neeraja
>Assignee: Paul Rogers
> Fix For: Future
>
>
> This is a tracking item to make Drill work with YARN.
> Below are few requirements/needs to consider.
> - Drill should run as an YARN based application, side by side with other YARN 
> enabled applications (on same nodes or different nodes). Both memory and CPU 
> resources of Drill should be controlled in this mechanism.
> - As an YARN enabled application, Drill resource consumption should be 
> adaptive to the load on the cluster. For ex: When there is no load on the 
> Drill , Drill should consume no resources on the cluster.  As the load on 
> Drill increases, resources permitting, usage should grow proportionally.
> - Low latency is a key requirement for Apache Drill along with support for 
> multiple users (concurrency in 100s-1000s). This should be supported when run 
> as YARN application as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins


[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228757#comment-15228757
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58751970
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java ---
@@ -215,6 +192,36 @@ public void eval() {
 }
   }
 
+  <#-- IS_DISTINCT_FROM function -->
+  @FunctionTemplate(names = {"is_distinct_from", "is distinct from" },
--- End diff --

I added tests for each category of template code path (primitive type, 
decimal type and interval type) in TestIsDistinctFromFunctions.java


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in


[ 
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228751#comment-15228751
 ] 

ASF GitHub Bot commented on DRILL-4577:
---

Github user hsuanyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/461#discussion_r58751518
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ---
@@ -72,4 +80,56 @@ public String getTypeName() {
 return HiveStoragePluginConfig.NAME;
   }
 
+  @Override
+  public void visitTables(final RecordGenerator recordGenerator, final 
String schemaPath) {
+final List tableNames = Lists.newArrayList(getTableNames());
+List tables;
+// Retries once if the first call to fetch the metadata fails
+synchronized(mClient) {
+  try {
+tables = mClient.getTableObjectsByName(getName(), tableNames);
--- End diff --

@vkorukanti  Thanks for pointing this out. Regardless of the permission, 
getTableObjectsByName will return the requested tables. Thus, as in [1], I used 
mClient.tableExists() to check the permission.


[1]https://github.com/apache/drill/pull/461/files#diff-bb5d8a385888df1dacc85fc011acd94bR93


> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as 
> {code}
> select * from INFORMATION_SCHEMA.`TABLES` 
> {code}
> is converted as calls to fetch all tables from storage plugins. 
> When users have Hive, the calls to hive metadata storage would be: 
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of 
> queries. Beside, a more efficient way is to fetch tables is to use 
> get_multi_table call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner


[ 
https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228750#comment-15228750
 ] 

ASF GitHub Bot commented on DRILL-4576:
---

Github user laurentgo commented on the pull request:

https://github.com/apache/drill/pull/466#issuecomment-206487918
  
Patch overall looks good to me (except maybe the abstract class vs 
interface stuff)


> Add StoragePlugin API to register materialization into planner
> --
>
> Key: DRILL-4576
> URL: https://issues.apache.org/jira/browse/DRILL-4576
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Laurent Goujon
>Assignee: Jacques Nadeau
>
> There's no currently a good way to register materializations into Drill 
> planner. Calcite's MaterializationService.instance() would be the way to go, 
> but the registration happens in 
> {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is 
> not called by Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner


[ 
https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228747#comment-15228747
 ] 

ASF GitHub Bot commented on DRILL-4576:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/466#discussion_r58751431
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import java.util.Collection;
+
+import org.apache.calcite.plan.RelOptPlanner;
+
+/**
+ * A callback that StoragePlugins can initialize to allow further 
configuration
+ * of the Planner at initialization time. Examples could be to allow 
adding lattices,
+ * materializations or additional traits to the planner that will be used 
in
+ * planning.
+ */
+public abstract class PlannerCallback {
+
+  /**
+   * Method that will be called before a planner is used to further 
configure the planner.
+   * @param planner The planner to be configured.
+   */
+  public abstract void initializePlanner(RelOptPlanner planner);
--- End diff --

really minor thing, but the name sounds strange compared to what the 
function is supposed to do? what about `onInitialization(RelOptPlanner 
planner)` or simply `apply(RelOptPlanner planner)`


> Add StoragePlugin API to register materialization into planner
> --
>
> Key: DRILL-4576
> URL: https://issues.apache.org/jira/browse/DRILL-4576
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Laurent Goujon
>Assignee: Jacques Nadeau
>
> There's no currently a good way to register materializations into Drill 
> planner. Calcite's MaterializationService.instance() would be the way to go, 
> but the registration happens in 
> {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is 
> not called by Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins


[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228746#comment-15228746
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58751332
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java ---
@@ -215,6 +192,36 @@ public void eval() {
 }
   }
 
+  <#-- IS_DISTINCT_FROM function -->
+  @FunctionTemplate(names = {"is_distinct_from", "is distinct from" },
--- End diff --

I am not opposed to having a native implementation of IS [NOT] DISTINCT 
FROM...clearly the generated code is more compact; however adding this new 
functions means we would need proper functional test coverage for various data 
types.  Any thoughts regarding that ? 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner


[ 
https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228717#comment-15228717
 ] 

ASF GitHub Bot commented on DRILL-4576:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/466#discussion_r58749059
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import java.util.Collection;
+
+import org.apache.calcite.plan.RelOptPlanner;
+
+/**
+ * A callback that StoragePlugins can initialize to allow further 
configuration
+ * of the Planner at initialization time. Examples could be to allow 
adding lattices,
+ * materializations or additional traits to the planner that will be used 
in
+ * planning.
+ */
+public abstract class PlannerCallback {
+
+  /**
+   * Method that will be called before a planner is used to further 
configure the planner.
+   * @param planner The planner to be configured.
+   */
+  public abstract void initializePlanner(RelOptPlanner planner);
+
+
+  public static PlannerCallback merge(Collection 
callbacks){
--- End diff --

pure style comment (feel free to ignore). You sometimes put a space before 
a brace, sometimes not (I prefer with space personally as it feels more 
readable, and it is pretty standard across project, but my point is about 
consistency).


> Add StoragePlugin API to register materialization into planner
> --
>
> Key: DRILL-4576
> URL: https://issues.apache.org/jira/browse/DRILL-4576
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Laurent Goujon
>Assignee: Jacques Nadeau
>
> There's no currently a good way to register materializations into Drill 
> planner. Calcite's MaterializationService.instance() would be the way to go, 
> but the registration happens in 
> {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is 
> not called by Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner


[ 
https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228716#comment-15228716
 ] 

ASF GitHub Bot commented on DRILL-4576:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/466#discussion_r58749026
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import java.util.Collection;
+
+import org.apache.calcite.plan.RelOptPlanner;
+
+/**
+ * A callback that StoragePlugins can initialize to allow further 
configuration
+ * of the Planner at initialization time. Examples could be to allow 
adding lattices,
+ * materializations or additional traits to the planner that will be used 
in
+ * planning.
+ */
+public abstract class PlannerCallback {
--- End diff --

why not an interface?


> Add StoragePlugin API to register materialization into planner
> --
>
> Key: DRILL-4576
> URL: https://issues.apache.org/jira/browse/DRILL-4576
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Laurent Goujon
>Assignee: Jacques Nadeau
>
> There's no currently a good way to register materializations into Drill 
> planner. Calcite's MaterializationService.instance() would be the way to go, 
> but the registration happens in 
> {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is 
> not called by Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition

2016-04-06 Thread John Omernik (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228689#comment-15228689
 ] 

John Omernik commented on DRILL-4530:
-

Let me add a big +1 to using protobuf for the cache.  We could even include a 
simple jar with Drill to decode the protobuf to json for human 
reading/troubleshooting.  If you consider how many times a human would read the 
metadata cache vs. how many times Drill will do it without human eyes, json 
does not provide any appreciable advantage over protobuf, especially if we just 
include a jar we can use to ready any protobuf file as json when needed. 

> Improve metadata cache performance for queries with single partition 
> -
>
> Key: DRILL-4530
> URL: https://issues.apache.org/jira/browse/DRILL-4530
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
>
> Consider two types of queries which are run with Parquet metadata caching: 
> {noformat}
> query 1:
> SELECT col FROM  `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query2 
> elapsed time is 9 sec even though both are accessing the same amount of data. 
>  The user expectation is that they should perform roughly the same.  The main 
> difference comes from reading the bigger metadata cache file at the root 
> level 'A' for query2 and then applying the partitioning filter.  query1 reads 
> a much smaller metadata cache file at the subdirectory level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition

2016-04-06 Thread Aman Sinha (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228674#comment-15228674
 ] 

Aman Sinha commented on DRILL-4530:
---

Yes, indeed the storage format of the metadata cache has been discussed a few 
times and various options are on the table (I believe [~parthc] has done some 
analysis of the options).  Thanks for the experimentation using protobuf.  The 
loading time improvements are quite impressive.  The advantages of JSON 
(simple, human readable etc.) are outweighed by the performance tradeoffs.  In 
any new option we consider, we must keep in mind the fast incremental refresh 
scenario - this feature is highly requested by all users who are using metadata 
cache. 

> Improve metadata cache performance for queries with single partition 
> -
>
> Key: DRILL-4530
> URL: https://issues.apache.org/jira/browse/DRILL-4530
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
>
> Consider two types of queries which are run with Parquet metadata caching: 
> {noformat}
> query 1:
> SELECT col FROM  `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query2 
> elapsed time is 9 sec even though both are accessing the same amount of data. 
>  The user expectation is that they should perform roughly the same.  The main 
> difference comes from reading the bigger metadata cache file at the root 
> level 'A' for query2 and then applying the partitioning filter.  query1 reads 
> a much smaller metadata cache file at the subdirectory level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins


[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228651#comment-15228651
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58742916
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java ---
@@ -215,6 +192,36 @@ public void eval() {
 }
   }
 
+  <#-- IS_DISTINCT_FROM function -->
+  @FunctionTemplate(names = {"is_distinct_from", "is distinct from" },
--- End diff --

If adding new functions is a concern, I can make the 
```RelOptUtil#splitJoinCondition``` to identify rewritten ```IS NOT DISTINCT 
FROM``` functions also.


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins


[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228646#comment-15228646
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58742546
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -169,4 +176,223 @@ private static boolean containIdentity(List exps,
 }
 return true;
   }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, 
RexNode, List, List)}. Modified to rewrite
--- End diff --

I will followup with a JIRA on Calcite project to see if we can push this 
change to Calcite. 

The function ```RelOptUtil#splitJoinCondition``` in the current form itself 
seems to have a problem/limitation. Currently it just returns the left and 
right join key indices, but doesn't return whether the condition is ```EQUAL``` 
or ```IS NOT DISTINCT FROM``` (it adds the key pair if they have either of 
these function in comparison). 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins


[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228628#comment-15228628
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58741416
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java ---
@@ -215,6 +192,36 @@ public void eval() {
 }
   }
 
+  <#-- IS_DISTINCT_FROM function -->
+  @FunctionTemplate(names = {"is_distinct_from", "is distinct from" },
--- End diff --

I am not sure if there is way to differentiate between the function in join 
condition vs. function in project expr. I don't see any context info in 
DrillConvertletTable.get() method call. Also the generated code in rewritten 
case is too much. For following query:
```SELECT INT_col is not distinct from BIGINT_col as col, 
int_distinct_result FROM cp.`functions/distinct_from.json```

Without rewrite: 
https://gist.github.com/vkorukanti/e981058f985ed24e6c4ef6b47d670e0f
With rewrite: 
https://gist.github.com/vkorukanti/d80aa2ba40c65c9215c38ed18b20a685

Sizes may differ after scalar replacement is done, but it is still too much 
code to simple ```is not distinct from``` function. 




> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4571) Add link to local Drill logs from the web UI


[ 
https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228626#comment-15228626
 ] 

Arina Ielchiieva commented on DRILL-4571:
-

Splitting this Jira into two. This Jira will deliver ability to view local logs 
from Web UI.
https://issues.apache.org/jira/browse/DRILL-4585 will add ability to view logs 
from all drillibits.

> Add link to local Drill logs from the web UI
> 
>
> Key: DRILL-4571
> URL: https://issues.apache.org/jira/browse/DRILL-4571
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Attachments: display_log.JPG, log_list.JPG
>
>
> Now we have link to the profile from the web UI.
> It will be handy for the users to have the link to local logs as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4571) Add link to local Drill logs from the web UI


 [ 
https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4571:

Summary: Add link to local Drill logs from the web UI  (was: Add link to 
the Drill log from the web UI)

> Add link to local Drill logs from the web UI
> 
>
> Key: DRILL-4571
> URL: https://issues.apache.org/jira/browse/DRILL-4571
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Attachments: display_log.JPG, log_list.JPG
>
>
> Now we have link to the profile from the web UI.
> It will be handy for the users to have the link to the log as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4571) Add link to local Drill logs from the web UI


 [ 
https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4571:

Description: 
Now we have link to the profile from the web UI.
It will be handy for the users to have the link to local logs as well.

  was:
Now we have link to the profile from the web UI.
It will be handy for the users to have the link to the log as well.


> Add link to local Drill logs from the web UI
> 
>
> Key: DRILL-4571
> URL: https://issues.apache.org/jira/browse/DRILL-4571
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Attachments: display_log.JPG, log_list.JPG
>
>
> Now we have link to the profile from the web UI.
> It will be handy for the users to have the link to local logs as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4585) Add ability to view logs from all drillbits in Web UI

Arina Ielchiieva created DRILL-4585:
---

 Summary: Add ability to view logs from all drillbits in Web UI
 Key: DRILL-4585
 URL: https://issues.apache.org/jira/browse/DRILL-4585
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Arina Ielchiieva
 Fix For: Future


Currently we can only view logs in Web UI from local drillibit. It would be 
nice, if we could see logs from all active drillbits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4584) JDBC/ODBC Client IP in Drill audit logs

2016-04-06 Thread Vitalii Diravka (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228592#comment-15228592
 ] 

Vitalii Diravka commented on DRILL-4584:


Is this an ip address of client machine with drill web console, drill shell or 
jdbc/odbc client?
Or this is an ip address of foreman node? If the answer is IP address of 
foreman, And what is better to show hostname, IP address or ip:port.
!https://drill.apache.org/docs/img/query-flow-client.png!

> JDBC/ODBC Client IP in Drill audit logs
> ---
>
> Key: DRILL-4584
> URL: https://issues.apache.org/jira/browse/DRILL-4584
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC, Client - ODBC
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.7.0
>
>
> Currently Drill audit logs - sqlline_queries.json and drillbit_queries.json 
> provide information about client username who fired the query . It will be 
> good to also have the client IP from where the query was fired .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4584) JDBC/ODBC Client IP in Drill audit logs

2016-04-06 Thread Vitalii Diravka (JIRA)

Vitalii Diravka created DRILL-4584:
--

 Summary: JDBC/ODBC Client IP in Drill audit logs
 Key: DRILL-4584
 URL: https://issues.apache.org/jira/browse/DRILL-4584
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - JDBC, Client - ODBC
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
Priority: Minor
 Fix For: 1.7.0


Currently Drill audit logs - sqlline_queries.json and drillbit_queries.json 
provide information about client username who fired the query . It will be good 
to also have the client IP from where the query was fired .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-3842) JVM dies if drill attempts to read too many files in the directory that blow up heap


 [ 
https://issues.apache.org/jira/browse/DRILL-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned DRILL-3842:
---

Assignee: Deneche A. Hakim

> JVM dies if drill attempts to read too many files in the directory that blow 
> up heap 
> -
>
> Key: DRILL-3842
> URL: https://issues.apache.org/jira/browse/DRILL-3842
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
>
> Run {{select count(*) from t1}} where  t1 directory consists of 1.9 million 
> little parquet files. The outcome: drillbit is dead and out of working set.
> 1. Client never got response back from the server
> 2. drillbit.log
> {code}
> 2015-09-25 17:56:55,935 [29fa756f-894d-0340-3661-b925bff0fe11:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 47999 ms to get file statuses
> 2015-09-25 18:43:19,871 [BitServer-4] INFO  
> o.a.d.exec.rpc.control.ControlServer - RPC connection /10.10.88.135:31011 
> <--> /10.10.88.135:51675 (control server) timed out.  Timeout was set to 300 
> seconds. Closing connection.
> 2015-09-25 18:50:06,026 [BitServer-3] INFO  
> o.a.d.exec.rpc.control.ControlClient - Channel closed /10.10.88.135:51675 
> <--> /10.10.88.135:31011.
> 2015-09-25 18:50:06,032 [UserServer-1] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.135:31010 <--> /10.10.88.133:51612 (user client).  
> Closing connection.
> java.lang.OutOfMemoryError: Java heap space
> {code}
> drillbit.out
> {code}
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "main-SendThread(atsqa4-133.qa.lab:5181)"
> Exception in thread "WorkManager.StatusThread" java.lang.OutOfMemoryError: 
> Java heap space
> 2015-09-25 18:53:52
> Full thread dump OpenJDK 64-Bit Server VM (24.65-b04 mixed mode):
> {code}
> jstack
> {code}
> [Fri Sep 25 18:53:29 ] # jstack 63205 
> 63205: Unable to open socket file: target process not responding or HotSpot 
> VM not loaded
> The -F option can be used when the target process is not responding
> {code}
> jstack -F
> {code}
> Attaching to process ID 63205, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 24.65-b04
> java.lang.RuntimeException: Unable to deduce type of thread from address 
> 0x04093800 (expected type JavaThread, CompilerThread, ServiceThread, 
> JvmtiAgentThread, or SurrogateLockerThread)
>   at 
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
>   at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
>   at 
> sun.jvm.hotspot.runtime.DeadlockDetector.createThreadTable(DeadlockDetector.java:149)
>   at 
> sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:56)
>   at 
> sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
>   at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
>   at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
>   at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
>   at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
>   at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
>   at sun.tools.jstack.JStack.main(JStack.java:102)
> Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for 
> type of address 0x04093800
>   at 
> sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
>   at 
> sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
>   at 
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
>   ... 15 more
> Exception in thread "main" java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
>   at sun.tools.jstack.JStack.main(JStack.java:102)
> Caused by: java.lang.RuntimeException: Unable to deduce type of

[jira] [Commented] (DRILL-3842) JVM dies if drill attempts to read too many files in the directory that blow up heap


[ 
https://issues.apache.org/jira/browse/DRILL-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228573#comment-15228573
 ] 

Deneche A. Hakim commented on DRILL-3842:
-

Although I was able to reproduce the issue on 1.2.0, it's no longer occurring 
in the latest master. It still takes more than 30mn to plan the query and the 
heap usage grows to 4GB on the Foreman node, but the query succeeds.
I suspect reading the parquet metadata cache for all 2M files is the cause, I 
will investigate if this is indeed the case

> JVM dies if drill attempts to read too many files in the directory that blow 
> up heap 
> -
>
> Key: DRILL-3842
> URL: https://issues.apache.org/jira/browse/DRILL-3842
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Victoria Markman
>Priority: Critical
>
> Run {{select count(*) from t1}} where  t1 directory consists of 1.9 million 
> little parquet files. The outcome: drillbit is dead and out of working set.
> 1. Client never got response back from the server
> 2. drillbit.log
> {code}
> 2015-09-25 17:56:55,935 [29fa756f-894d-0340-3661-b925bff0fe11:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 47999 ms to get file statuses
> 2015-09-25 18:43:19,871 [BitServer-4] INFO  
> o.a.d.exec.rpc.control.ControlServer - RPC connection /10.10.88.135:31011 
> <--> /10.10.88.135:51675 (control server) timed out.  Timeout was set to 300 
> seconds. Closing connection.
> 2015-09-25 18:50:06,026 [BitServer-3] INFO  
> o.a.d.exec.rpc.control.ControlClient - Channel closed /10.10.88.135:51675 
> <--> /10.10.88.135:31011.
> 2015-09-25 18:50:06,032 [UserServer-1] ERROR 
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.  
> Connection: /10.10.88.135:31010 <--> /10.10.88.133:51612 (user client).  
> Closing connection.
> java.lang.OutOfMemoryError: Java heap space
> {code}
> drillbit.out
> {code}
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "main-SendThread(atsqa4-133.qa.lab:5181)"
> Exception in thread "WorkManager.StatusThread" java.lang.OutOfMemoryError: 
> Java heap space
> 2015-09-25 18:53:52
> Full thread dump OpenJDK 64-Bit Server VM (24.65-b04 mixed mode):
> {code}
> jstack
> {code}
> [Fri Sep 25 18:53:29 ] # jstack 63205 
> 63205: Unable to open socket file: target process not responding or HotSpot 
> VM not loaded
> The -F option can be used when the target process is not responding
> {code}
> jstack -F
> {code}
> Attaching to process ID 63205, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 24.65-b04
> java.lang.RuntimeException: Unable to deduce type of thread from address 
> 0x04093800 (expected type JavaThread, CompilerThread, ServiceThread, 
> JvmtiAgentThread, or SurrogateLockerThread)
>   at 
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
>   at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
>   at 
> sun.jvm.hotspot.runtime.DeadlockDetector.createThreadTable(DeadlockDetector.java:149)
>   at 
> sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:56)
>   at 
> sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
>   at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
>   at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
>   at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
>   at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
>   at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
>   at sun.tools.jstack.JStack.main(JStack.java:102)
> Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for 
> type of address 0x04093800
>   at 
> sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
>   at 
> sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
>   at 
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
>   ... 15 more
> Exception in thread "main" java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
>

[jira] [Commented] (DRILL-4139) Exception while trying to prune partition. java.lang.UnsupportedOperationException: Unsupported type: BIT

2016-04-06 Thread Johannes Zillmann (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228516#comment-15228516
 ] 

Johannes Zillmann commented on DRILL-4139:
--

Having the same issue with drill-1.6.0!

> Exception while trying to prune partition. 
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> -
>
> Key: DRILL-4139
> URL: https://issues.apache.org/jira/browse/DRILL-4139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
> Environment: 4 node cluster on CentOS
>Reporter: Khurram Faraaz
>Assignee: Aman Sinha
>
> Exception while trying to prune partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> is seen in drillbit.log after Functional run on 4 node cluster.
> Drill 1.3.0 sys.version => d61bb83a8
> {code}
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2015-11-27 03:12:19,810 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] WARN  
> o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
> partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:479)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:235)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:184)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins


[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228503#comment-15228503
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58730151
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java ---
@@ -215,6 +192,36 @@ public void eval() {
 }
   }
 
+  <#-- IS_DISTINCT_FROM function -->
+  @FunctionTemplate(names = {"is_distinct_from", "is distinct from" },
--- End diff --

@vkorukanti, I want to clarify...if the query only had a join condition 
with IS_NOT_DISTINCT_FROM, I would think it should work just with your 
convertlet changes, since both HashJoin and MergeJoin handle this type of join 
condition.  Is the reason you had to implement the full comparator codegen to 
handle more general types of comparisons ?  e.g in the SELECT list if I say  
'SELECT  a IS NOT DISTINCT FROM b'  ?Suppose we had a convertlet that only 
preserved the IS (NOT) DISTINCT FROM join condition, and defaulted to the 
Calcite rewrite using the CASE expression, then we would not have to implement 
the full comparator. 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins


[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228484#comment-15228484
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58728331
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -169,4 +176,223 @@ private static boolean containIdentity(List exps,
 }
 return true;
   }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, 
RexNode, List, List)}. Modified to rewrite
--- End diff --

Agree that we ideally should leverage the Calcite code..especially since 
this method is pretty heavily used and modified periodically so keeping Drill's 
version of this method in sync will be difficult. 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3894) Directory functions (MaxDir, MinDir ..) should have optional filename parameter


[ 
https://issues.apache.org/jira/browse/DRILL-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227995#comment-15227995
 ] 

ASF GitHub Bot commented on DRILL-3894:
---

GitHub user vdiravka opened a pull request:

https://github.com/apache/drill/pull/467

DRILL-3894: Upgrade functions MaxDir, MinDir... Optional filename parameter

Functions MaxDir, MinDir, iMaxDir, iMinDir with one (schema) parameter were 
added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vdiravka/drill DRILL-3894

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/467.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #467


commit 966d76a06f82dcb265849b90bcff8ce8a770f4ec
Author: Vitalii Diravka 
Date:   2016-04-05T15:07:29Z

DRILL-3894: Upgrade functions MaxDir, MinDir... Optional filename parameter
- added functions MaxDir, MinDir, iMaxDir, iMinDir with one (schema) 
parameter.




> Directory functions (MaxDir, MinDir ..) should have optional filename 
> parameter
> ---
>
> Key: DRILL-3894
> URL: https://issues.apache.org/jira/browse/DRILL-3894
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.2.0
>Reporter: Neeraja
>Assignee: Vitalii Diravka
>
> https://drill.apache.org/docs/query-directory-functions/
> The directory functions documented above should provide ability to have 
> second parameter(file name) as optional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4543) Advertise Drill-bit ports, status, capabilities in ZooKeeper


[ 
https://issues.apache.org/jira/browse/DRILL-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227857#comment-15227857
 ] 

Paul Rogers commented on DRILL-4543:


As it turns out, the Drill startup scripts have a number of bugs that prevent 
passing of -Dname=value system properties on the command line. See DRILL-4581.

> Advertise Drill-bit ports, status, capabilities in ZooKeeper
> 
>
> Key: DRILL-4543
> URL: https://issues.apache.org/jira/browse/DRILL-4543
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server
>Reporter: Paul Rogers
> Fix For: 2.0.0
>
>
> Today Drill uses ZooKeeper (ZK) to advertise the existence of a Drill-bit, 
> providing the host name/IP Address of the Drill-bit and the ports used, 
> encoded in Protobuf format. All other information (status, CPUs, memory) are 
> assumed to be the same across all Drill-bits in the cluster as specified in 
> the Drill config file. (Amended to reflect 1.6 behavior.)
> Moving forward, as Drill becomes more sophisticated, Drill should advertise 
> the specifics of each Drill-bit so that one Drill bit can differ from another.
> For example, when running on YARN, we need a way for Drill to gracefully shut 
> down. Advertising a status of Ready or Unavailable will help. Ready is the 
> normal state. Unavailable means the Drill-bit will finish in-flight queries, 
> but won't accept new ones. (The actual status is a separate enhancement.)
> In a YARN cluster, Drill should take advantage of machines with more memory, 
> but live with machines with less. (Perhaps some are newer, some are older or 
> more heavily loaded.) Drill should use ZK to identify its available memory 
> and CPUs so that the planner can use them. (Use of the info is a separate 
> enhancement.)
> There may be times when two drill bits run on a single machine. If so, they 
> must use separate ports. So, each Drill-bit should advertise its ports in ZK.
> For backward compatibility, the information is optional; if not present, the 
> receiver should assume the information defaults to that in the config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4581) Various problems in the Drill startup scripts


 [ 
https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4581:
---
Summary: Various problems in the Drill startup scripts  (was: Various 
inconsistencies in the Drill startup scripts)

> Various problems in the Drill startup scripts
> -
>
> Key: DRILL-4581
> URL: https://issues.apache.org/jira/browse/DRILL-4581
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Noticed the following in drillbit.sh:
> 1) Comment: DRILL_LOG_DIRWhere log files are stored.  PWD by default.
> Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log
> 2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default.
> Code: DRILL_PID_DIR=$DRILL_HOME
> 3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which 
> checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is 
> both unnecessary and prints a less informative message than the 
> drill-config.sh check. Suggestion: Remove the JAVA_HOME check in drillbit.sh.
> 4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the 
> JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? 
> Recommended: export JAVA_HOME from drill-config.sh.
> 5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and 
> set the default value. Drill-config.sh defaults to /var/log/drill, or if that 
> fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not 
> handle the case where that directory is not writable. Suggested: remove the 
> check in drillbit.sh.
> 6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching 
> sqlline.log, but does not delete that file, leaving a bogus, empty client log 
> file on the drillbit server. Recommendation: use bash commands instead.
> 7) The implementation of the above check is a bit awkward. It has a fallback 
> case with somewhat awkward logic. Clean this up.
> 8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if 
> it does not exist. Recommended: decide on a single choice, implement it in 
> drill-config.sh.
> 9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults 
> it to $DRILL_HOME/conf. This can lead to subtle errors. If I use
> drillbit.sh --config /misspelled/path
> where I mistype the path, I won't get an error, I get the default config, 
> which may not at all be what I want to run. Recommendation: if the value of 
> DRILL_CONF_DRILL is passed into the script (as a variable or via --config), 
> then that directory must exist. Else, use the default.
> 10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left 
> over from the original Hadoop script that the Drill script was based upon. 
> Recomendation: export only in the case that HADOOP_HOME is set for cygwin.
> 11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to 
> stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a 
> different message (to stdout) if the version is wrong. Recommendation: use 
> the same format (and stderr) for both.
> 12) Similarly, other Java checks later in the script produce messages to 
> stdout, not stderr.
> 13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies 
> that it is executable. The script then throws away what we just found. Then, 
> drill-bit.sh tries to recreate this information as:
> JAVA=$JAVA_HOME/bin/java
> This is wrong in two ways: 1) it ignores the actual java location and assumes 
> it, and 2) it does not handle the java.exe case that drill-config.sh 
> carefully worked out.
> Recommendation: export JAVA from drill-config.sh and remove the above line 
> from drillbit.sh.
> 14) drillbit.sh presumably takes extra arguments like this:
> drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 
> -Dvar2=value2 -Dvar3=value3
> The -D bit allows the user to override config variables at the command line. 
> But, the scripts don't use the values.
> A) drill-config.sh consumes --config /my/conf/dir after consuming the leading 
> arguments:
> while [ $# -gt 1 ]; do
>   if [ "--config" = "$1" ]; then
> shift
> confdir=$1
> shift
> DRILL_CONF_DIR=$confdir
>   else
> # Presume we are at end of options and break
> break
>   fi
> done
> B) drill-bit.sh will discard the var1:
> startStopStatus=$1 <-- grabs "start"
> shift
> command=drillbit
> shift   <-- Consumes -Dvar1=value1
> C) Remaining values passed back into drillbit.sh:
> args=$@
> nohup $thiscmd internal_start $command $args
> D) Second invocation discards -Dvar2=value2 as described above.
> E) Remaining values are passed to

[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition


[ 
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227790#comment-15227790
 ] 

Deneche A. Hakim commented on DRILL-4530:
-

I did an experiment where I *hacked* Drill to use protobuf instead of json for 
the metadata cache and for a customer case with a parquet table with 3 levels 
of directories and 395250 files, the protobuf cache was 87% smaller than json 
and loaded 83% faster.

> Improve metadata cache performance for queries with single partition 
> -
>
> Key: DRILL-4530
> URL: https://issues.apache.org/jira/browse/DRILL-4530
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
>
> Consider two types of queries which are run with Parquet metadata caching: 
> {noformat}
> query 1:
> SELECT col FROM  `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query2 
> elapsed time is 9 sec even though both are accessing the same amount of data. 
>  The user expectation is that they should perform roughly the same.  The main 
> difference comes from reading the bigger metadata cache file at the root 
> level 'A' for query2 and then applying the partitioning filter.  query1 reads 
> a much smaller metadata cache file at the subdirectory level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-4530) Improve metadata cache performance for queries with single partition