[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
[ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229729#comment-15229729 ] ASF GitHub Bot commented on DRILL-3714: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/463#discussion_r58821834 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java --- @@ -84,7 +115,7 @@ public void operationComplete(ChannelFuture future) throws Exception { if (!future.isSuccess()) { removeFromMap(coordinationId); if (future.channel().isActive()) { - throw new RpcException("Future failed") ; + throw new RpcException("Future failed"); --- End diff -- Since the future did not succeed, should this `setException(future.cause())`? There would be no outcome for the `handler` otherwise, right? > Query runs out of memory and remains in CANCELLATION_REQUESTED state until > drillbit is restarted > > > Key: DRILL-3714 > URL: https://issues.apache.org/jira/browse/DRILL-3714 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Victoria Markman >Assignee: Jacques Nadeau >Priority: Critical > Fix For: 1.7.0 > > Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, > jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json > > > This is a variation of DRILL-3705 with the difference of drill behavior when > hitting OOM condition. > Query runs out of memory during execution and remains in > "CANCELLATION_REQUESTED" state until drillbit is bounced. > Client (sqlline in this case) never gets a response from the server. > Reproduction details: > Single node drillbit installation. > DRILL_MAX_DIRECT_MEMORY="8G" > DRILL_HEAP="4G" > Run this query on TPCDS SF100 data set > {code} > SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS > TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 > LIMIT 10; > {code} > drillbit.log > {code} > 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING > 2015-08-26 16:55:50,498 [BitServer-5] WARN > o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 > took longer than 500ms. Actual duration was 2569ms. > 2015-08-26 16:56:31,086 [BitServer-5] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server). > Closing connection. > io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct > buffer memory > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) > ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > Caused by:
[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
[ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229728#comment-15229728 ] ASF GitHub Bot commented on DRILL-3714: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/463#discussion_r58821796 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java --- @@ -20,51 +20,82 @@ import io.netty.buffer.ByteBuf; import io.netty.channel.ChannelFuture; -import java.util.Map; -import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import org.apache.drill.common.exceptions.UserRemoteException; import org.apache.drill.exec.proto.UserBitShared.DrillPBError; +import com.carrotsearch.hppc.IntObjectHashMap; +import com.carrotsearch.hppc.procedures.IntObjectProcedure; +import com.google.common.base.Preconditions; + /** - * Manages the creation of rpc futures for a particular socket. + * Manages the creation of rpc futures for a particular socket <--> socket + * connection. Generally speaking, there will be two threads working with this + * class (the socket thread and the Request generating thread). Synchronization + * is simple with the map being the only thing that is protected. Everything + * else works via Atomic variables. */ -public class CoordinationQueue { - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class); +class RequestIdMap { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(RequestIdMap.class); + + private final AtomicInteger value = new AtomicInteger(); + private final AtomicBoolean acceptMessage = new AtomicBoolean(true); - private final PositiveAtomicInteger circularInt = new PositiveAtomicInteger(); - private final Mapmap; + /** Access to map must be protected. **/ + private final IntObjectHashMap map; - public CoordinationQueue(int segmentSize, int segmentCount) { -map = new ConcurrentHashMap (segmentSize, 0.75f, segmentCount); + public RequestIdMap() { +map = new IntObjectHashMap (); } void channelClosed(Throwable ex) { +acceptMessage.set(false); if (ex != null) { - RpcException e; - if (ex instanceof RpcException) { -e = (RpcException) ex; - } else { -e = new RpcException(ex); + final RpcException e = RpcException.mapException(ex); + synchronized (map) { +map.forEach(new Closer(e)); +map.clear(); } - for (RpcOutcome f : map.values()) { -f.setException(e); +} + } + + private class Closer implements IntObjectProcedure { +final RpcException exception; + +public Closer(RpcException exception) { + this.exception = exception; +} + +@Override +public void apply(int key, RpcOutcome value) { + try{ +value.setException(exception); + }catch(Exception e){ +logger.warn("Failure while attempting to fail rpc response.", e); } } + } - public ChannelListenerWithCoordinationId get(RpcOutcomeListener handler, Class clazz, RemoteConnection connection) { -int i = circularInt.getNext(); + public ChannelListenerWithCoordinationId createNewRpcListener(RpcOutcomeListener handler, Class clazz, + RemoteConnection connection) { +int i = value.incrementAndGet(); RpcListener future = new RpcListener(handler, clazz, i, connection); -Object old = map.put(i, future); -if (old != null) { - throw new IllegalStateException( - "You attempted to reuse a coordination id when the previous coordination id has not been removed. This is likely rpc future callback memory leak."); +final Object old; +synchronized (map) { + Preconditions.checkArgument(acceptMessage.get(), --- End diff -- Make this check first statement in the method? > Query runs out of memory and remains in CANCELLATION_REQUESTED state until > drillbit is restarted > > > Key: DRILL-3714 > URL: https://issues.apache.org/jira/browse/DRILL-3714 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Victoria Markman >Assignee: Jacques Nadeau >
[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
[ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229727#comment-15229727 ] ASF GitHub Bot commented on DRILL-3714: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/463#discussion_r58821789 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java --- @@ -20,51 +20,82 @@ import io.netty.buffer.ByteBuf; import io.netty.channel.ChannelFuture; -import java.util.Map; -import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import org.apache.drill.common.exceptions.UserRemoteException; import org.apache.drill.exec.proto.UserBitShared.DrillPBError; +import com.carrotsearch.hppc.IntObjectHashMap; +import com.carrotsearch.hppc.procedures.IntObjectProcedure; +import com.google.common.base.Preconditions; + /** - * Manages the creation of rpc futures for a particular socket. + * Manages the creation of rpc futures for a particular socket <--> socket + * connection. Generally speaking, there will be two threads working with this + * class (the socket thread and the Request generating thread). Synchronization + * is simple with the map being the only thing that is protected. Everything + * else works via Atomic variables. */ -public class CoordinationQueue { - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class); +class RequestIdMap { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(RequestIdMap.class); + + private final AtomicInteger value = new AtomicInteger(); + private final AtomicBoolean acceptMessage = new AtomicBoolean(true); - private final PositiveAtomicInteger circularInt = new PositiveAtomicInteger(); - private final Mapmap; + /** Access to map must be protected. **/ + private final IntObjectHashMap map; - public CoordinationQueue(int segmentSize, int segmentCount) { -map = new ConcurrentHashMap (segmentSize, 0.75f, segmentCount); + public RequestIdMap() { +map = new IntObjectHashMap (); } void channelClosed(Throwable ex) { +acceptMessage.set(false); if (ex != null) { - RpcException e; - if (ex instanceof RpcException) { -e = (RpcException) ex; - } else { -e = new RpcException(ex); + final RpcException e = RpcException.mapException(ex); + synchronized (map) { +map.forEach(new Closer(e)); +map.clear(); } - for (RpcOutcome f : map.values()) { -f.setException(e); +} + } + + private class Closer implements IntObjectProcedure { --- End diff -- Better class name? `SetExceptionProcedure`? > Query runs out of memory and remains in CANCELLATION_REQUESTED state until > drillbit is restarted > > > Key: DRILL-3714 > URL: https://issues.apache.org/jira/browse/DRILL-3714 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Victoria Markman >Assignee: Jacques Nadeau >Priority: Critical > Fix For: 1.7.0 > > Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, > jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json > > > This is a variation of DRILL-3705 with the difference of drill behavior when > hitting OOM condition. > Query runs out of memory during execution and remains in > "CANCELLATION_REQUESTED" state until drillbit is bounced. > Client (sqlline in this case) never gets a response from the server. > Reproduction details: > Single node drillbit installation. > DRILL_MAX_DIRECT_MEMORY="8G" > DRILL_HEAP="4G" > Run this query on TPCDS SF100 data set > {code} > SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS > TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 > LIMIT 10; > {code} > drillbit.log > {code} > 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING > 2015-08-26 16:55:50,498 [BitServer-5] WARN > o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 > took longer than 500ms. Actual duration was 2569ms. > 2015-08-26 16:56:31,086
[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
[ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229724#comment-15229724 ] ASF GitHub Bot commented on DRILL-3714: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/463#discussion_r58821752 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java --- @@ -20,51 +20,82 @@ import io.netty.buffer.ByteBuf; import io.netty.channel.ChannelFuture; -import java.util.Map; -import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import org.apache.drill.common.exceptions.UserRemoteException; import org.apache.drill.exec.proto.UserBitShared.DrillPBError; +import com.carrotsearch.hppc.IntObjectHashMap; +import com.carrotsearch.hppc.procedures.IntObjectProcedure; +import com.google.common.base.Preconditions; + /** - * Manages the creation of rpc futures for a particular socket. + * Manages the creation of rpc futures for a particular socket <--> socket + * connection. Generally speaking, there will be two threads working with this + * class (the socket thread and the Request generating thread). Synchronization + * is simple with the map being the only thing that is protected. Everything + * else works via Atomic variables. */ -public class CoordinationQueue { - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class); +class RequestIdMap { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(RequestIdMap.class); + + private final AtomicInteger value = new AtomicInteger(); --- End diff -- How about `coordinationIdCounter` and `isOpen`? > Query runs out of memory and remains in CANCELLATION_REQUESTED state until > drillbit is restarted > > > Key: DRILL-3714 > URL: https://issues.apache.org/jira/browse/DRILL-3714 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Victoria Markman >Assignee: Jacques Nadeau >Priority: Critical > Fix For: 1.7.0 > > Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, > jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json > > > This is a variation of DRILL-3705 with the difference of drill behavior when > hitting OOM condition. > Query runs out of memory during execution and remains in > "CANCELLATION_REQUESTED" state until drillbit is bounced. > Client (sqlline in this case) never gets a response from the server. > Reproduction details: > Single node drillbit installation. > DRILL_MAX_DIRECT_MEMORY="8G" > DRILL_HEAP="4G" > Run this query on TPCDS SF100 data set > {code} > SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS > TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 > LIMIT 10; > {code} > drillbit.log > {code} > 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING > 2015-08-26 16:55:50,498 [BitServer-5] WARN > o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 > took longer than 500ms. Actual duration was 2569ms. > 2015-08-26 16:56:31,086 [BitServer-5] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server). > Closing connection. > io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct > buffer memory > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) > ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at >
[jira] [Commented] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR
[ https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229500#comment-15229500 ] jean-claude commented on DRILL-4573: Please review > Zero copy LIKE, REGEXP_MATCHES, SUBSTR > -- > > Key: DRILL-4573 > URL: https://issues.apache.org/jira/browse/DRILL-4573 > Project: Apache Drill > Issue Type: Improvement >Reporter: jean-claude >Priority: Minor > Attachments: DRILL-4573.1.patch.txt > > > All the functions using the java.util.regex.Matcher are currently creating > Java string objects to pass into the matcher.reset(). > However this creates unnecessary copy of the bytes and a Java string object. > The matcher uses a CharSequence, so instead of making a copy we can create an > adapter from the DrillBuffer to the CharSequence interface. > Gains of 25% in execution speed are possible when going over VARCHAR of 36 > chars. The gain will be proportional to the size of the VARCHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR
[ https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229499#comment-15229499 ] jean-claude commented on DRILL-4573: You can test the performance gain by creating a simple csv file with one column containing UUID like this for i in {1..100}; do uuidgen; done > /Users/jccote/test.csv then query using drill select count(1) from dfs.`/Users/jccote/test.csv` where columns[0] like '0%'; run it multiple times to get a good estimate > Zero copy LIKE, REGEXP_MATCHES, SUBSTR > -- > > Key: DRILL-4573 > URL: https://issues.apache.org/jira/browse/DRILL-4573 > Project: Apache Drill > Issue Type: Improvement >Reporter: jean-claude >Priority: Minor > Attachments: DRILL-4573.1.patch.txt > > > All the functions using the java.util.regex.Matcher are currently creating > Java string objects to pass into the matcher.reset(). > However this creates unnecessary copy of the bytes and a Java string object. > The matcher uses a CharSequence, so instead of making a copy we can create an > adapter from the DrillBuffer to the CharSequence interface. > Gains of 25% in execution speed are possible when going over VARCHAR of 36 > chars. The gain will be proportional to the size of the VARCHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4523) Disallow using loopback address in distributed mode
[ https://issues.apache.org/jira/browse/DRILL-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229459#comment-15229459 ] ASF GitHub Bot commented on DRILL-4523: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/445 > Disallow using loopback address in distributed mode > --- > > Key: DRILL-4523 > URL: https://issues.apache.org/jira/browse/DRILL-4523 > Project: Apache Drill > Issue Type: Improvement > Components: Server >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > > If we enable debug for org.apache.drill.exec.coord.zk in logback.xml, we only > get the hostname and ports information. For example: > {code} > 2015-11-04 19:47:02,927 [ServiceCache-0] DEBUG > o.a.d.e.c.zk.ZKClusterCoordinator - Cache changed, updating. > 2015-11-04 19:47:02,932 [ServiceCache-0] DEBUG > o.a.d.e.c.zk.ZKClusterCoordinator - Active drillbit set changed. Now > includes 2 total bits. New active drillbits: > h3.poc.com:31010:31011:31012 > h2.poc.com:31010:31011:31012 > {code} > We need to know the IP address of each hostname to do further troubleshooting. > Imagine if any drillbit registers itself as "localhost.localdomain" in > zookeeper, we will never know where it comes from. Enabling IP address > tracking can help this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4588) Enable JMXReporter to Expose Metrics
[ https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229458#comment-15229458 ] ASF GitHub Bot commented on DRILL-4588: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/469 > Enable JMXReporter to Expose Metrics > > > Key: DRILL-4588 > URL: https://issues.apache.org/jira/browse/DRILL-4588 > Project: Apache Drill > Issue Type: Bug >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam > > -There is a static initialization order issue that needs to be fixed.- > The code is commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4544) Improve error messages for REFRESH TABLE METADATA command
[ https://issues.apache.org/jira/browse/DRILL-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229461#comment-15229461 ] ASF GitHub Bot commented on DRILL-4544: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/448 > Improve error messages for REFRESH TABLE METADATA command > - > > Key: DRILL-4544 > URL: https://issues.apache.org/jira/browse/DRILL-4544 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.7.0 > > > Improve the error messages thrown by REFRESH TABLE METADATA command: > In the first case below, the error is maprfs.abc doesn't exist. It should > throw a Object not found or workspace not found. It is currently throwing a > non helpful message; > 0: jdbc:drill:> refresh table metadata maprfs.abc.`my_table`; > + > oksummary > + > false Error: null > + > 1 row selected (0.355 seconds) > In the second case below, it says refresh table metadata is supported only > for single-directory based Parquet tables. But the command works for nested > multi-directory Parquet files. > 0: jdbc:drill:> refresh table metadata maprfs.vnaranammalpuram.`rfm_sales_vw`; > ---+ > oksummary > ---+ > false Table rfm_sales_vw does not support metadata refresh. Support is > currently limited to single-directory-based Parquet tables. > ---+ > 1 row selected (0.418 seconds) > 0: jdbc:drill:> -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3743) query hangs on sqlline once Drillbit on foreman node is killed
[ https://issues.apache.org/jira/browse/DRILL-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229460#comment-15229460 ] ASF GitHub Bot commented on DRILL-3743: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/460 > query hangs on sqlline once Drillbit on foreman node is killed > -- > > Key: DRILL-3743 > URL: https://issues.apache.org/jira/browse/DRILL-3743 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 > Environment: 4 node cluster CentOS >Reporter: Khurram Faraaz >Assignee: Sudheesh Katkam >Priority: Critical > Fix For: Future > > > sqlline/query hangs once Drillbit (on Foreman node) is killed. (kill -9 ) > query was issued from the Foreman node. The query returns many records, and > it is a long running query. > Steps to reproduce the problem. > set planner.slice_target=1 > 1. clush -g khurram service mapr-warden stop > 2. clush -g khurram service mapr-warden start > 3. ./sqlline -u "jdbc:drill:schema=dfs.tmp" > 0: jdbc:drill:schema=dfs.tmp> select * from `twoKeyJsn.json` limit 200; > 4. Immediately from another console do a jps and kill the Drillbit process > (in this case foreman) while the query is being run on sqlline. You will > notice that sqlline just hangs, we do not see any exceptions or errors being > reported on sqlline prompt or in drillbit.log or drillbit.out > I do see this Exception in sqlline.log on the node from where sqlline was > started > {code} > 2015-09-04 18:45:12,069 [Client-1] INFO o.a.d.e.rpc.user.QueryResultHandler > - User Error Occurred > org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: > Connection /10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) > closed unexpectedly. > [Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524) > ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > 2015-09-04 18:45:12,069 [Client-1] INFO > o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#7] Query failed: > org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: > Connection /10.10.100.201:53425 <--> /10.10.100.201:31010 (user client) > closed unexpectedly. > [Error Id: ec316cfd-c9a5-4905-98e3-da20cb799ba5 ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:524) > ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.rpc.user.QueryResultHandler$SubmissionListener$ChannelClosedListener.operationComplete(QueryResultHandler.java:298) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) >
[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
[ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229455#comment-15229455 ] ASF GitHub Bot commented on DRILL-3714: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/463#discussion_r58806647 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RpcBus.java --- @@ -158,22 +157,16 @@ public ChannelClosedHandler(C clientConnection, Channel channel) { @Override public void operationComplete(ChannelFuture future) throws Exception { - String msg; + final String msg; + if(local!=null) { msg = String.format("Channel closed %s <--> %s.", local, remote); }else{ msg = String.format("Channel closed %s <--> %s.", future.channel().localAddress(), future.channel().remoteAddress()); } - if (RpcBus.this.isClient()) { --- End diff -- `isClient` method is no longer used. Remove the method. > Query runs out of memory and remains in CANCELLATION_REQUESTED state until > drillbit is restarted > > > Key: DRILL-3714 > URL: https://issues.apache.org/jira/browse/DRILL-3714 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Victoria Markman >Assignee: Jacques Nadeau >Priority: Critical > Fix For: 1.7.0 > > Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, > jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json > > > This is a variation of DRILL-3705 with the difference of drill behavior when > hitting OOM condition. > Query runs out of memory during execution and remains in > "CANCELLATION_REQUESTED" state until drillbit is bounced. > Client (sqlline in this case) never gets a response from the server. > Reproduction details: > Single node drillbit installation. > DRILL_MAX_DIRECT_MEMORY="8G" > DRILL_HEAP="4G" > Run this query on TPCDS SF100 data set > {code} > SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS > TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 > LIMIT 10; > {code} > drillbit.log > {code} > 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING > 2015-08-26 16:55:50,498 [BitServer-5] WARN > o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 > took longer than 500ms. Actual duration was 2569ms. > 2015-08-26 16:56:31,086 [BitServer-5] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server). > Closing connection. > io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct > buffer memory > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) > ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) >
[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
[ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229451#comment-15229451 ] ASF GitHub Bot commented on DRILL-3714: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/463#discussion_r58806614 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java --- @@ -20,51 +20,82 @@ import io.netty.buffer.ByteBuf; import io.netty.channel.ChannelFuture; -import java.util.Map; -import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import org.apache.drill.common.exceptions.UserRemoteException; import org.apache.drill.exec.proto.UserBitShared.DrillPBError; +import com.carrotsearch.hppc.IntObjectHashMap; +import com.carrotsearch.hppc.procedures.IntObjectProcedure; +import com.google.common.base.Preconditions; + /** - * Manages the creation of rpc futures for a particular socket. + * Manages the creation of rpc futures for a particular socket <--> socket + * connection. Generally speaking, there will be two threads working with this + * class (the socket thread and the Request generating thread). Synchronization + * is simple with the map being the only thing that is protected. Everything + * else works via Atomic variables. */ -public class CoordinationQueue { - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class); +class RequestIdMap { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(RequestIdMap.class); + + private final AtomicInteger value = new AtomicInteger(); + private final AtomicBoolean acceptMessage = new AtomicBoolean(true); - private final PositiveAtomicInteger circularInt = new PositiveAtomicInteger(); --- End diff -- Remove PositiveAtomicIneteger class. > Query runs out of memory and remains in CANCELLATION_REQUESTED state until > drillbit is restarted > > > Key: DRILL-3714 > URL: https://issues.apache.org/jira/browse/DRILL-3714 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Victoria Markman >Assignee: Jacques Nadeau >Priority: Critical > Fix For: 1.7.0 > > Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, > jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json > > > This is a variation of DRILL-3705 with the difference of drill behavior when > hitting OOM condition. > Query runs out of memory during execution and remains in > "CANCELLATION_REQUESTED" state until drillbit is bounced. > Client (sqlline in this case) never gets a response from the server. > Reproduction details: > Single node drillbit installation. > DRILL_MAX_DIRECT_MEMORY="8G" > DRILL_HEAP="4G" > Run this query on TPCDS SF100 data set > {code} > SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS > TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 > LIMIT 10; > {code} > drillbit.log > {code} > 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING > 2015-08-26 16:55:50,498 [BitServer-5] WARN > o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 > took longer than 500ms. Actual duration was 2569ms. > 2015-08-26 16:56:31,086 [BitServer-5] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server). > Closing connection. > io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct > buffer memory > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) > ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at >
[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
[ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229456#comment-15229456 ] ASF GitHub Bot commented on DRILL-3714: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/463#discussion_r58806651 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RpcBus.java --- @@ -261,6 +251,7 @@ public void execute(Runnable command) { public InboundHandler(C connection) { super(); + Preconditions.checkNotNull(connection); --- End diff -- `this.connection = Preconditions.checkNotNull(connection);` > Query runs out of memory and remains in CANCELLATION_REQUESTED state until > drillbit is restarted > > > Key: DRILL-3714 > URL: https://issues.apache.org/jira/browse/DRILL-3714 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Victoria Markman >Assignee: Jacques Nadeau >Priority: Critical > Fix For: 1.7.0 > > Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, > jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json > > > This is a variation of DRILL-3705 with the difference of drill behavior when > hitting OOM condition. > Query runs out of memory during execution and remains in > "CANCELLATION_REQUESTED" state until drillbit is bounced. > Client (sqlline in this case) never gets a response from the server. > Reproduction details: > Single node drillbit installation. > DRILL_MAX_DIRECT_MEMORY="8G" > DRILL_HEAP="4G" > Run this query on TPCDS SF100 data set > {code} > SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS > TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 > LIMIT 10; > {code} > drillbit.log > {code} > 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING > 2015-08-26 16:55:50,498 [BitServer-5] WARN > o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 > took longer than 500ms. Actual duration was 2569ms. > 2015-08-26 16:56:31,086 [BitServer-5] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server). > Closing connection. > io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct > buffer memory > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) > ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > Caused by: java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71] > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > ~[na:1.7.0_71] > at
[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
[ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229454#comment-15229454 ] ASF GitHub Bot commented on DRILL-3714: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/463#discussion_r58806644 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java --- @@ -20,51 +20,82 @@ import io.netty.buffer.ByteBuf; import io.netty.channel.ChannelFuture; -import java.util.Map; -import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import org.apache.drill.common.exceptions.UserRemoteException; import org.apache.drill.exec.proto.UserBitShared.DrillPBError; +import com.carrotsearch.hppc.IntObjectHashMap; +import com.carrotsearch.hppc.procedures.IntObjectProcedure; +import com.google.common.base.Preconditions; + /** - * Manages the creation of rpc futures for a particular socket. + * Manages the creation of rpc futures for a particular socket <--> socket + * connection. Generally speaking, there will be two threads working with this + * class (the socket thread and the Request generating thread). Synchronization + * is simple with the map being the only thing that is protected. Everything + * else works via Atomic variables. */ -public class CoordinationQueue { - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class); +class RequestIdMap { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(RequestIdMap.class); + + private final AtomicInteger value = new AtomicInteger(); + private final AtomicBoolean acceptMessage = new AtomicBoolean(true); - private final PositiveAtomicInteger circularInt = new PositiveAtomicInteger(); - private final Mapmap; + /** Access to map must be protected. **/ + private final IntObjectHashMap map; - public CoordinationQueue(int segmentSize, int segmentCount) { -map = new ConcurrentHashMap (segmentSize, 0.75f, segmentCount); + public RequestIdMap() { +map = new IntObjectHashMap (); } void channelClosed(Throwable ex) { +acceptMessage.set(false); if (ex != null) { - RpcException e; - if (ex instanceof RpcException) { -e = (RpcException) ex; - } else { -e = new RpcException(ex); + final RpcException e = RpcException.mapException(ex); + synchronized (map) { +map.forEach(new Closer(e)); +map.clear(); } - for (RpcOutcome f : map.values()) { -f.setException(e); +} + } + + private class Closer implements IntObjectProcedure { +final RpcException exception; + +public Closer(RpcException exception) { + this.exception = exception; +} + +@Override +public void apply(int key, RpcOutcome value) { + try{ +value.setException(exception); + }catch(Exception e){ +logger.warn("Failure while attempting to fail rpc response.", e); } } + } - public ChannelListenerWithCoordinationId get(RpcOutcomeListener handler, Class clazz, RemoteConnection connection) { -int i = circularInt.getNext(); + public ChannelListenerWithCoordinationId createNewRpcListener(RpcOutcomeListener handler, Class clazz, + RemoteConnection connection) { +int i = value.incrementAndGet(); RpcListener future = new RpcListener(handler, clazz, i, connection); -Object old = map.put(i, future); -if (old != null) { - throw new IllegalStateException( - "You attempted to reuse a coordination id when the previous coordination id has not been removed. This is likely rpc future callback memory leak."); +final Object old; +synchronized (map) { + Preconditions.checkArgument(acceptMessage.get(), + "Attempted to send a message when connection is no longer valid."); + old = map.put(i, future); } +Preconditions.checkArgument(old == null, --- End diff -- Not required, since numbers are no longer reused? > Query runs out of memory and remains in CANCELLATION_REQUESTED state until > drillbit is restarted > > > Key: DRILL-3714 > URL: https://issues.apache.org/jira/browse/DRILL-3714 > Project: Apache Drill >
[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
[ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229450#comment-15229450 ] ASF GitHub Bot commented on DRILL-3714: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/463#discussion_r58806611 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java --- @@ -20,51 +20,82 @@ import io.netty.buffer.ByteBuf; import io.netty.channel.ChannelFuture; -import java.util.Map; -import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import org.apache.drill.common.exceptions.UserRemoteException; import org.apache.drill.exec.proto.UserBitShared.DrillPBError; +import com.carrotsearch.hppc.IntObjectHashMap; +import com.carrotsearch.hppc.procedures.IntObjectProcedure; +import com.google.common.base.Preconditions; + /** - * Manages the creation of rpc futures for a particular socket. + * Manages the creation of rpc futures for a particular socket <--> socket + * connection. Generally speaking, there will be two threads working with this + * class (the socket thread and the Request generating thread). Synchronization + * is simple with the map being the only thing that is protected. Everything + * else works via Atomic variables. */ -public class CoordinationQueue { - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class); +class RequestIdMap { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(RequestIdMap.class); --- End diff -- private > Query runs out of memory and remains in CANCELLATION_REQUESTED state until > drillbit is restarted > > > Key: DRILL-3714 > URL: https://issues.apache.org/jira/browse/DRILL-3714 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Victoria Markman >Assignee: Jacques Nadeau >Priority: Critical > Fix For: 1.7.0 > > Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, > jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json > > > This is a variation of DRILL-3705 with the difference of drill behavior when > hitting OOM condition. > Query runs out of memory during execution and remains in > "CANCELLATION_REQUESTED" state until drillbit is bounced. > Client (sqlline in this case) never gets a response from the server. > Reproduction details: > Single node drillbit installation. > DRILL_MAX_DIRECT_MEMORY="8G" > DRILL_HEAP="4G" > Run this query on TPCDS SF100 data set > {code} > SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS > TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 > LIMIT 10; > {code} > drillbit.log > {code} > 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING > 2015-08-26 16:55:50,498 [BitServer-5] WARN > o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3 > took longer than 500ms. Actual duration was 2569ms. > 2015-08-26 16:56:31,086 [BitServer-5] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server). > Closing connection. > io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct > buffer memory > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) > ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at >
[jira] [Commented] (DRILL-3714) Query runs out of memory and remains in CANCELLATION_REQUESTED state until drillbit is restarted
[ https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229453#comment-15229453 ] ASF GitHub Bot commented on DRILL-3714: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/463#discussion_r58806623 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/RequestIdMap.java --- @@ -20,51 +20,82 @@ import io.netty.buffer.ByteBuf; import io.netty.channel.ChannelFuture; -import java.util.Map; -import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import org.apache.drill.common.exceptions.UserRemoteException; import org.apache.drill.exec.proto.UserBitShared.DrillPBError; +import com.carrotsearch.hppc.IntObjectHashMap; +import com.carrotsearch.hppc.procedures.IntObjectProcedure; +import com.google.common.base.Preconditions; + /** - * Manages the creation of rpc futures for a particular socket. + * Manages the creation of rpc futures for a particular socket <--> socket + * connection. Generally speaking, there will be two threads working with this + * class (the socket thread and the Request generating thread). Synchronization + * is simple with the map being the only thing that is protected. Everything + * else works via Atomic variables. */ -public class CoordinationQueue { - static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(CoordinationQueue.class); +class RequestIdMap { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(RequestIdMap.class); + + private final AtomicInteger value = new AtomicInteger(); + private final AtomicBoolean acceptMessage = new AtomicBoolean(true); - private final PositiveAtomicInteger circularInt = new PositiveAtomicInteger(); - private final Mapmap; + /** Access to map must be protected. **/ + private final IntObjectHashMap map; - public CoordinationQueue(int segmentSize, int segmentCount) { -map = new ConcurrentHashMap (segmentSize, 0.75f, segmentCount); + public RequestIdMap() { +map = new IntObjectHashMap (); } void channelClosed(Throwable ex) { +acceptMessage.set(false); if (ex != null) { - RpcException e; - if (ex instanceof RpcException) { -e = (RpcException) ex; - } else { -e = new RpcException(ex); + final RpcException e = RpcException.mapException(ex); + synchronized (map) { +map.forEach(new Closer(e)); +map.clear(); } - for (RpcOutcome f : map.values()) { -f.setException(e); +} + } + + private class Closer implements IntObjectProcedure { +final RpcException exception; + +public Closer(RpcException exception) { + this.exception = exception; +} + +@Override +public void apply(int key, RpcOutcome value) { + try{ --- End diff -- Inconsistent spacing here and below. > Query runs out of memory and remains in CANCELLATION_REQUESTED state until > drillbit is restarted > > > Key: DRILL-3714 > URL: https://issues.apache.org/jira/browse/DRILL-3714 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Victoria Markman >Assignee: Jacques Nadeau >Priority: Critical > Fix For: 1.7.0 > > Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log, > jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json > > > This is a variation of DRILL-3705 with the difference of drill behavior when > hitting OOM condition. > Query runs out of memory during execution and remains in > "CANCELLATION_REQUESTED" state until drillbit is bounced. > Client (sqlline in this case) never gets a response from the server. > Reproduction details: > Single node drillbit installation. > DRILL_MAX_DIRECT_MEMORY="8G" > DRILL_HEAP="4G" > Run this query on TPCDS SF100 data set > {code} > SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS > TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1 > LIMIT 10; > {code} > drillbit.log > {code} > 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22:
[jira] [Updated] (DRILL-4573) Zero copy LIKE, REGEXP_MATCHES, SUBSTR
[ https://issues.apache.org/jira/browse/DRILL-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jean-claude updated DRILL-4573: --- Attachment: DRILL-4573.1.patch.txt > Zero copy LIKE, REGEXP_MATCHES, SUBSTR > -- > > Key: DRILL-4573 > URL: https://issues.apache.org/jira/browse/DRILL-4573 > Project: Apache Drill > Issue Type: Improvement >Reporter: jean-claude >Priority: Minor > Attachments: DRILL-4573.1.patch.txt > > > All the functions using the java.util.regex.Matcher are currently creating > Java string objects to pass into the matcher.reset(). > However this creates unnecessary copy of the bytes and a Java string object. > The matcher uses a CharSequence, so instead of making a copy we can create an > adapter from the DrillBuffer to the CharSequence interface. > Gains of 25% in execution speed are possible when going over VARCHAR of 36 > chars. The gain will be proportional to the size of the VARCHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
[ https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229433#comment-15229433 ] ASF GitHub Bot commented on DRILL-4577: --- Github user hsuanyi commented on a diff in the pull request: https://github.com/apache/drill/pull/461#discussion_r58805242 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java --- @@ -72,4 +80,76 @@ public String getTypeName() { return HiveStoragePluginConfig.NAME; } + @Override + public List> getTablesByNames(final List tableNames) { +final String schemaName = getName(); +final List > tableNameToTable = Lists.newArrayList(); +List tables; +// Retries once if the first call to fetch the metadata fails +synchronized(mClient) { + final List tableNamesWithAuth = Lists.newArrayList(); + for(String tableName : tableNames) { +try { + if(mClient.tableExists(schemaName, tableName)) { --- End diff -- I did some tests here. When there are many tables, the improvement by optimizing for the second objective is not significant enough. However, the objective of this issue would make sense only when there are many tables. I think I still need to figure out a solution. > Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in > --- > > Key: DRILL-4577 > URL: https://issues.apache.org/jira/browse/DRILL-4577 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > Fix For: 1.7.0 > > > A query such as > {code} > select * from INFORMATION_SCHEMA.`TABLES` > {code} > is converted as calls to fetch all tables from storage plugins. > When users have Hive, the calls to hive metadata storage would be: > 1) get_table > 2) get_partitions > However, the information regarding partitions is not used in this type of > queries. Beside, a more efficient way is to fetch tables is to use > get_multi_table call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4588) Enable JMXReporter to Expose Metrics
[ https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229372#comment-15229372 ] ASF GitHub Bot commented on DRILL-4588: --- Github user parthchandra commented on the pull request: https://github.com/apache/drill/pull/469#issuecomment-206624211 +1. > Enable JMXReporter to Expose Metrics > > > Key: DRILL-4588 > URL: https://issues.apache.org/jira/browse/DRILL-4588 > Project: Apache Drill > Issue Type: Bug >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam > > -There is a static initialization order issue that needs to be fixed.- > The code is commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4588) Enable JMXReporter to Expose Metrics
[ https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam updated DRILL-4588: --- Description: -There is a static initialization order issue that needs to be fixed.- The code is commented out. was:There is a static initialization order issue that needs to be fixed. > Enable JMXReporter to Expose Metrics > > > Key: DRILL-4588 > URL: https://issues.apache.org/jira/browse/DRILL-4588 > Project: Apache Drill > Issue Type: Bug >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam > > -There is a static initialization order issue that needs to be fixed.- > The code is commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4588) Enable JMXReporter to Expose Metrics
[ https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229364#comment-15229364 ] ASF GitHub Bot commented on DRILL-4588: --- GitHub user sudheeshkatkam opened a pull request: https://github.com/apache/drill/pull/469 DRILL-4588: Enable JMX reporting @parthchandra please review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sudheeshkatkam/drill DRILL-4588 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/469.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #469 commit 4500cc9075c72622972e81939551ada2dfdca0a5 Author: Sudheesh KatkamDate: 2016-04-06T23:41:52Z DRILL-4588: Enable JMX reporting > Enable JMXReporter to Expose Metrics > > > Key: DRILL-4588 > URL: https://issues.apache.org/jira/browse/DRILL-4588 > Project: Apache Drill > Issue Type: Bug >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam > > There is a static initialization order issue that needs to be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead
[ https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229300#comment-15229300 ] ASF GitHub Bot commented on DRILL-4589: --- GitHub user jinfengni opened a pull request: https://github.com/apache/drill/pull/468 DRILL-4589: Reduce planning time for file system partition pruning by… … reducing filter evaluation overhead You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinfengni/incubator-drill DRILL-4589 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/468.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #468 commit e207a926e65cd788700229de3ae47cf4e876 Author: Jinfeng NiDate: 2016-02-25T18:13:43Z DRILL-4589: Reduce planning time for file system partition pruning by reducing filter evaluation overhead > Reduce planning time for file system partition pruning by reducing filter > evaluation overhead > - > > Key: DRILL-4589 > URL: https://issues.apache.org/jira/browse/DRILL-4589 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > > When Drill is used to query hundreds of thousands, or even millions of files > organized into multi-level directories, user typically will provide a > partition filter like : dir0 = something and dir1 = something2 and .. . > For such queries, we saw the query planning time could be unacceptable long, > due to three main overheads: 1) to expand and get the list of files, 2) to > evaluate the partition filter, 3) to get the metadata, in the case of parquet > files for which metadata cache file is not available. > DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after > DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the > partition filter evaluation is applied to file level. In many cases, we saw > that the number of leaf subdirectories is significantly lower than that of > files. Since all the files under the same leaf subdirecctory share the same > directory metadata, we should apply the filter evaluation at the leaf > subdirectory. By doing that, we could reduce the cpu overhead to evaluate the > filter, and the memory overhead as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead
[ https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229302#comment-15229302 ] ASF GitHub Bot commented on DRILL-4589: --- Github user jinfengni commented on the pull request: https://github.com/apache/drill/pull/468#issuecomment-206611168 @amansinha100 , could you please review this PR? thanks! > Reduce planning time for file system partition pruning by reducing filter > evaluation overhead > - > > Key: DRILL-4589 > URL: https://issues.apache.org/jira/browse/DRILL-4589 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > > When Drill is used to query hundreds of thousands, or even millions of files > organized into multi-level directories, user typically will provide a > partition filter like : dir0 = something and dir1 = something2 and .. . > For such queries, we saw the query planning time could be unacceptable long, > due to three main overheads: 1) to expand and get the list of files, 2) to > evaluate the partition filter, 3) to get the metadata, in the case of parquet > files for which metadata cache file is not available. > DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after > DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the > partition filter evaluation is applied to file level. In many cases, we saw > that the number of leaf subdirectories is significantly lower than that of > files. Since all the files under the same leaf subdirecctory share the same > directory metadata, we should apply the filter evaluation at the leaf > subdirectory. By doing that, we could reduce the cpu overhead to evaluate the > filter, and the memory overhead as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead
[ https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229298#comment-15229298 ] Jinfeng Ni commented on DRILL-4589: --- I have a patch for this JIRA. Using the same dataset used in the comparison done in DRILL-2517(With 115k parquet files in total, it's organized in 25 directories (1990, 1991, ... ), and each directory has four subdirectories (Q1, Q2, Q3, Q4).), here is the query planning time measured on a mac laptop. {code} explain plan for select * from dfs.`/drill/testdata/tpch-sf10/lineitem115k` where dir0 = '1990' and dir1 = 'Q1'; {code} Without the patch (on today's master branch: {code} 1 row selected (8.084 seconds) {code} With the patch {code} 1 row selected (4.306 seconds) {code} If the partition filter contains complex expression, then the improvement percentage is even higher. For this query, the improvement is 24.951 seconds vs. 4.393 seconds {code} explain plan for select * from dfs.`/drill/testdata/tpch-sf10/lineitem115k` where concat(substr(dir0, 1, 4), substr(dir1, 1, 2)) = '1990Q1'; {code} > Reduce planning time for file system partition pruning by reducing filter > evaluation overhead > - > > Key: DRILL-4589 > URL: https://issues.apache.org/jira/browse/DRILL-4589 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > > When Drill is used to query hundreds of thousands, or even millions of files > organized into multi-level directories, user typically will provide a > partition filter like : dir0 = something and dir1 = something2 and .. . > For such queries, we saw the query planning time could be unacceptable long, > due to three main overheads: 1) to expand and get the list of files, 2) to > evaluate the partition filter, 3) to get the metadata, in the case of parquet > files for which metadata cache file is not available. > DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after > DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the > partition filter evaluation is applied to file level. In many cases, we saw > that the number of leaf subdirectories is significantly lower than that of > files. Since all the files under the same leaf subdirecctory share the same > directory metadata, we should apply the filter evaluation at the leaf > subdirectory. By doing that, we could reduce the cpu overhead to evaluate the > filter, and the memory overhead as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1170) YARN support for Drill
[ https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229251#comment-15229251 ] Josh Elser commented on DRILL-1170: --- "substantial time" is definitely hard to sign up for, but I'd be happy to try to help out where/when at all possible. :) > YARN support for Drill > -- > > Key: DRILL-1170 > URL: https://issues.apache.org/jira/browse/DRILL-1170 > Project: Apache Drill > Issue Type: New Feature >Reporter: Neeraja >Assignee: Paul Rogers > Fix For: Future > > > This is a tracking item to make Drill work with YARN. > Below are few requirements/needs to consider. > - Drill should run as an YARN based application, side by side with other YARN > enabled applications (on same nodes or different nodes). Both memory and CPU > resources of Drill should be controlled in this mechanism. > - As an YARN enabled application, Drill resource consumption should be > adaptive to the load on the cluster. For ex: When there is no load on the > Drill , Drill should consume no resources on the cluster. As the load on > Drill increases, resources permitting, usage should grow proportionally. > - Low latency is a key requirement for Apache Drill along with support for > multiple users (concurrency in 100s-1000s). This should be supported when run > as YARN application as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4237) Skew in hash distribution
[ https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229160#comment-15229160 ] ASF GitHub Bot commented on DRILL-4237: --- Github user chunhui-shi commented on the pull request: https://github.com/apache/drill/pull/430#issuecomment-206578634 @jacques-n The email response is not pushed here. So copy the sent email as below: Thanks for pointing to openHFT. Yes, I went through multiple Java implementations including this one. The reason I decided to use smhasher as the source of truth was, the smhasher implementation includes comprehensive tests to cover the attributes for measuring goodness of a non-cryptographic hash function. And these attributes are subtle and could be found out to be a problem maybe only on certain lengths of input. So when I looked at these implementations, I checked what tests they have done first. And since there is no such tests(test multiple attributes and lengths) in these Java implementations to prove the hash functions are correct or good. So I decided to start from smhasher implementations and used the results generated from smhasher to verify any other(including drill's) implementation. > Skew in hash distribution > - > > Key: DRILL-4237 > URL: https://issues.apache.org/jira/browse/DRILL-4237 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.4.0 >Reporter: Aman Sinha >Assignee: Chunhui Shi > > Apparently, the fix in DRILL-4119 did not fully resolve the data skew issue. > It worked fine on the smaller sample of the data set but on another sample of > the same data set, it still produces skewed values - see below the hash > values which are all odd numbers. > {noformat} > 0: jdbc:drill:zk=local> select columns[0], hash32(columns[0]) from `test.csv` > limit 10; > +---+--+ > | EXPR$0 |EXPR$1| > +---+--+ > | f71aaddec3316ae18d43cb1467e88a41 | 1506011089 | > | 3f3a13bb45618542b5ac9d9536704d3a | 1105719049 | > | 6935afd0c693c67bba482cedb7a2919b | -18137557| > | ca2a938d6d7e57bda40501578f98c2a8 | -1372666789 | > | fab7f08402c8836563b0a5c94dbf0aec | -1930778239 | > | 9eb4620dcb68a84d17209da279236431 | -970026001 | > | 16eed4a4e801b98550b4ff504242961e | 356133757| > | a46f7935fea578ce61d8dd45bfbc2b3d | -94010449| > | 7fdf5344536080c15deb2b5a2975a2b7 | -141361507 | > | b82560a06e2e51b461c9fe134a8211bd | -375376717 | > +---+--+ > {noformat} > This indicates an underlying issue with the XXHash64 java implementation, > which is Drill's implementation of the C version. One of the key difference > as pointed out by [~jnadeau] was the use of unsigned int64 in the C version > compared to the Java version which uses (signed) long. I created an XXHash > version using com.google.common.primitives.UnsignedLong. However, > UnsignedLong does not have bit-wise operations that are needed for XXHash > such as rotateLeft(), XOR etc. One could write wrappers for these but at > this point, the question is: should we think of an alternative hash function > ? > The alternative approach could be the murmur hash for numeric data types that > we were using earlier and the Mahout version of hash function for string > types > (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java#L28). > As a test, I reverted to this function and was getting good hash > distribution for the test data. > I could not find any performance comparisons of our perf tests (TPC-H or DS) > with the original and newer (XXHash) hash functions. If performance is > comparable, should we revert to the original function ? > As an aside, I would like to remove the hash64 versions of the functions > since these are not used anywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead
[ https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-4589: - Assignee: Jinfeng Ni > Reduce planning time for file system partition pruning by reducing filter > evaluation overhead > - > > Key: DRILL-4589 > URL: https://issues.apache.org/jira/browse/DRILL-4589 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > > When Drill is used to query hundreds of thousands, or even millions of files > organized into multi-level directories, user typically will provide a > partition filter like : dir0 = something and dir1 = something2 and .. . > For such queries, we saw the query planning time could be unacceptable long, > due to three main overheads: 1) to expand and get the list of files, 2) to > evaluate the partition filter, 3) to get the metadata, in the case of parquet > files for which metadata cache file is not available. > DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after > DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the > partition filter evaluation is applied to file level. In many cases, we saw > that the number of leaf subdirectories is significantly lower than that of > files. Since all the files under the same leaf subdirecctory share the same > directory metadata, we should apply the filter evaluation at the leaf > subdirectory. By doing that, we could reduce the cpu overhead to evaluate the > filter, and the memory overhead as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1170) YARN support for Drill
[ https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229120#comment-15229120 ] Matt Pollock commented on DRILL-1170: - Thanks much. > YARN support for Drill > -- > > Key: DRILL-1170 > URL: https://issues.apache.org/jira/browse/DRILL-1170 > Project: Apache Drill > Issue Type: New Feature >Reporter: Neeraja >Assignee: Paul Rogers > Fix For: Future > > > This is a tracking item to make Drill work with YARN. > Below are few requirements/needs to consider. > - Drill should run as an YARN based application, side by side with other YARN > enabled applications (on same nodes or different nodes). Both memory and CPU > resources of Drill should be controlled in this mechanism. > - As an YARN enabled application, Drill resource consumption should be > adaptive to the load on the cluster. For ex: When there is no load on the > Drill , Drill should consume no resources on the cluster. As the load on > Drill increases, resources permitting, usage should grow proportionally. > - Low latency is a key requirement for Apache Drill along with support for > multiple users (concurrency in 100s-1000s). This should be supported when run > as YARN application as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4581) Various problems in the Drill startup scripts
[ https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4581: --- Description: Noticed the following in drillbit.sh: 1) Comment: DRILL_LOG_DIRWhere log files are stored. PWD by default. Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log 2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default. Code: DRILL_PID_DIR=$DRILL_HOME 3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is both unnecessary and prints a less informative message than the drill-config.sh check. Suggestion: Remove the JAVA_HOME check in drillbit.sh. 4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? Recommended: export JAVA_HOME from drill-config.sh. 5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and set the default value. Drill-config.sh defaults to /var/log/drill, or if that fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not handle the case where that directory is not writable. Suggested: remove the check in drillbit.sh. 6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching sqlline.log, but does not delete that file, leaving a bogus, empty client log file on the drillbit server. Recommendation: use bash commands instead. 7) The implementation of the above check is a bit awkward. It has a fallback case with somewhat awkward logic. Clean this up. 8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if it does not exist. Recommended: decide on a single choice, implement it in drill-config.sh. 9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults it to $DRILL_HOME/conf. This can lead to subtle errors. If I use drillbit.sh --config /misspelled/path where I mistype the path, I won't get an error, I get the default config, which may not at all be what I want to run. Recommendation: if the value of DRILL_CONF_DRILL is passed into the script (as a variable or via --config), then that directory must exist. Else, use the default. 10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left over from the original Hadoop script that the Drill script was based upon. Recomendation: export only in the case that HADOOP_HOME is set for cygwin. 11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a different message (to stdout) if the version is wrong. Recommendation: use the same format (and stderr) for both. 12) Similarly, other Java checks later in the script produce messages to stdout, not stderr. 13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies that it is executable. The script then throws away what we just found. Then, drill-bit.sh tries to recreate this information as: JAVA=$JAVA_HOME/bin/java This is wrong in two ways: 1) it ignores the actual java location and assumes it, and 2) it does not handle the java.exe case that drill-config.sh carefully worked out. Recommendation: export JAVA from drill-config.sh and remove the above line from drillbit.sh. 14) drillbit.sh presumably takes extra arguments like this: drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 -Dvar2=value2 -Dvar3=value3 The -D bit allows the user to override config variables at the command line. But, the scripts don't use the values. A) drill-config.sh consumes --config /my/conf/dir after consuming the leading arguments: while [ $# -gt 1 ]; do if [ "--config" = "$1" ]; then shift confdir=$1 shift DRILL_CONF_DIR=$confdir else # Presume we are at end of options and break break fi done B) drill-bit.sh will discard the var1: startStopStatus=$1 <-- grabs "start" shift command=drillbit shift <-- Consumes -Dvar1=value1 C) Remaining values passed back into drillbit.sh: args=$@ nohup $thiscmd internal_start $command $args D) Second invocation discards -Dvar2=value2 as described above. E) Remaining values are passed to runbit: "$DRILL_HOME"/bin/runbit $command "$@" start F) Where they again pass though drill-config. (Allowing us to do: drillbit.sh --config /first/conf --config /second/conf which is asking for trouble) G) And, the remaining arguments are simply not used: exec $JAVA -Dlog.path=$DRILLBIT_LOG_PATH -Dlog.query.path=$DRILLBIT_QUERY_LOG_PATH $DRILL_ALL_JAVA_OPTS -cp $CP org.apache.drill.exec.server.Drillbit 15) The checking of command-line args in drillbit.sh is wrong: # if no args specified, show usage if [ $# -lt 1 ]; then echo $usage exit 1 fi ... . "$bin"/drill-config.sh But, note, that drill-config.sh handles: drillbit.sh --config /conf/dir Consuming
[jira] [Commented] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead
[ https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229107#comment-15229107 ] Jinfeng Ni commented on DRILL-4589: --- This is related to DRILL-3759, which targets for multi-phased partition pruning. Both of them aim to improve the efficiency of partition pruning in drill's query planner. > Reduce planning time for file system partition pruning by reducing filter > evaluation overhead > - > > Key: DRILL-4589 > URL: https://issues.apache.org/jira/browse/DRILL-4589 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jinfeng Ni > > When Drill is used to query hundreds of thousands, or even millions of files > organized into multi-level directories, user typically will provide a > partition filter like : dir0 = something and dir1 = something2 and .. . > For such queries, we saw the query planning time could be unacceptable long, > due to three main overheads: 1) to expand and get the list of files, 2) to > evaluate the partition filter, 3) to get the metadata, in the case of parquet > files for which metadata cache file is not available. > DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after > DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the > partition filter evaluation is applied to file level. In many cases, we saw > that the number of leaf subdirectories is significantly lower than that of > files. Since all the files under the same leaf subdirecctory share the same > directory metadata, we should apply the filter evaluation at the leaf > subdirectory. By doing that, we could reduce the cpu overhead to evaluate the > filter, and the memory overhead as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead
Jinfeng Ni created DRILL-4589: - Summary: Reduce planning time for file system partition pruning by reducing filter evaluation overhead Key: DRILL-4589 URL: https://issues.apache.org/jira/browse/DRILL-4589 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Reporter: Jinfeng Ni When Drill is used to query hundreds of thousands, or even millions of files organized into multi-level directories, user typically will provide a partition filter like : dir0 = something and dir1 = something2 and .. . For such queries, we saw the query planning time could be unacceptable long, due to three main overheads: 1) to expand and get the list of files, 2) to evaluate the partition filter, 3) to get the metadata, in the case of parquet files for which metadata cache file is not available. DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the partition filter evaluation is applied to file level. In many cases, we saw that the number of leaf subdirectories is significantly lower than that of files. Since all the files under the same leaf subdirecctory share the same directory metadata, we should apply the filter evaluation at the leaf subdirectory. By doing that, we could reduce the cpu overhead to evaluate the filter, and the memory overhead as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1170) YARN support for Drill
[ https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229083#comment-15229083 ] Paul Rogers commented on DRILL-1170: Good progress is being made. Our tentative goal is the Drill 1.8 release for an initial integration. The goal is: YARN support in Drill 1.8 enables admins to migrate their existing Drill cluster to run under YARN. The admin simply identifies the nodes on which Drill should run, identifies the required container sizes, and brings up the Drill cluster under YARN. YARN manages resource allocations for Drill alongside those of other YARN applications. Drill-on-YARN monitors Drill-bits and automatically restarts any that fail. We'll have "experimental" support for starting/stopping Drill-bits. Starting bits is easy. Stopping is a bit of a challenge because we lack DRILL-2656. > YARN support for Drill > -- > > Key: DRILL-1170 > URL: https://issues.apache.org/jira/browse/DRILL-1170 > Project: Apache Drill > Issue Type: New Feature >Reporter: Neeraja >Assignee: Paul Rogers > Fix For: Future > > > This is a tracking item to make Drill work with YARN. > Below are few requirements/needs to consider. > - Drill should run as an YARN based application, side by side with other YARN > enabled applications (on same nodes or different nodes). Both memory and CPU > resources of Drill should be controlled in this mechanism. > - As an YARN enabled application, Drill resource consumption should be > adaptive to the load on the cluster. For ex: When there is no load on the > Drill , Drill should consume no resources on the cluster. As the load on > Drill increases, resources permitting, usage should grow proportionally. > - Low latency is a key requirement for Apache Drill along with support for > multiple users (concurrency in 100s-1000s). This should be supported when run > as YARN application as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1170) YARN support for Drill
[ https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229076#comment-15229076 ] Paul Rogers commented on DRILL-1170: Worth a discussion. Is Slider still the "go to" option, or has effort shifted to Twill? As it turns out, the actual YARN integration was not a big effort. Rather, most of the effort is around modifying Drill itself to play well with YARN, and implementing the management aspects unique to YARN. > YARN support for Drill > -- > > Key: DRILL-1170 > URL: https://issues.apache.org/jira/browse/DRILL-1170 > Project: Apache Drill > Issue Type: New Feature >Reporter: Neeraja >Assignee: Paul Rogers > Fix For: Future > > > This is a tracking item to make Drill work with YARN. > Below are few requirements/needs to consider. > - Drill should run as an YARN based application, side by side with other YARN > enabled applications (on same nodes or different nodes). Both memory and CPU > resources of Drill should be controlled in this mechanism. > - As an YARN enabled application, Drill resource consumption should be > adaptive to the load on the cluster. For ex: When there is no load on the > Drill , Drill should consume no resources on the cluster. As the load on > Drill increases, resources permitting, usage should grow proportionally. > - Low latency is a key requirement for Apache Drill along with support for > multiple users (concurrency in 100s-1000s). This should be supported when run > as YARN application as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4587) Document Drillbit launch options
[ https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902 ] Paul Rogers edited comment on DRILL-4587 at 4/6/16 8:47 PM: User-settable environment variables: DRILL_HOME Drill home (defaults based on the location of the drillbit.sh script.) DRILL_CONF_DIR Alternate drill configuration directory that contains the drill-override.conf and drill-env.sh files. Default is $DRILL_HOME/conf DRILL_LOG_DIR Where log files are stored. Default is /var/log/drill if that exists, else $DRILL_HOME/log DRILL_PID_DIR The directory where Drill stores its Process ID (pid) file. $DRILL_HOME by default. DRILL_IDENT_STRING A string representing this instance of drillbit. $USER by default DRILL_NICENESS The scheduling priority for daemons. Defaults to 0. DRILL_STOP_TIMEOUT Used when stopping the Drill-bit. Grace period time, in seconds, after which the script forcibly kills the server if it has not stopped. Default 120 seconds. JAVA_HOME The java implementation to use. If not set, looks for java on the command pass and uses that location. DRILL_CLASSPATHExtra Java CLASSPATH entries for custom code. DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed to the system classpath. HADOOP_HOMEHadoop home HBASE_HOME HBase home DRILL_JAVA_OPTS Optional JVM arguments such as system property overides used by both the drillbit and client. DRILLBIT_JAVA_OPTS Optional JVM arguments specifically for the drillbit. SERVER_GC_OPTS Garbage collection options, including debug options. Provide special syntax. An option of the form: -Xloggc: Will replace with the actual path to the Drill log directory. was (Author: paul-rogers): User-settable environment variables: DRILL_HOME Drill home (defaults based on the location of the drillbit.sh script.) DRILL_CONF_DIR Alternate drill configuration directory that contains the drill-override.conf and drill-env.sh files. Default is $DRILL_HOME/conf DRILL_LOG_DIR Where log files are stored. Default is /var/log/drill if that exists, else $DRILL_HOME/log DRILL_PID_DIR The directory where Drill stores its Process ID (pid) file. $DRILL_HOME by default. DRILL_IDENT_STRING A string representing this instance of drillbit. $USER by default DRILL_NICENESS The scheduling priority for daemons. Defaults to 0. DRILL_STOP_TIMEOUT Used when stopping the Drill-bit. Grace period time, in seconds, after which the script forcibly kills the server if it has not stopped. Default 120 seconds. JAVA_HOME The java implementation to use. If not set, looks for java on the command pass and uses that location. DRILL_CLASSPATHExtra Java CLASSPATH entries for custom code. DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed to the system classpath. HADOOP_HOMEHadoop home HBASE_HOME HBase home DRILL_JAVA_OPTS Optional JVM arguments such as system property overides used by both the drillbit and client. DRILLBIT_JAVA_OPTS Optional JVM arguments specifically for the drillbit. SERVER_GC_OPTS todo > Document Drillbit launch options > > > Key: DRILL-4587 > URL: https://issues.apache.org/jira/browse/DRILL-4587 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Paul Rogers >Assignee: Bridget Bevens > > Drill provides the drillbit.sh script to launch Drill. When Drill is run in > production environments, or when managed by a tool such as Mesos or YARN, > customers have many options to customize the launch options. We should > document this information as below. > The user can configure Drill launch in one of four ways, depending on their > needs. > 1. Using the properties in drill-override.conf. Sets only startup and runtime > properties. All drillbits should use a copy of the file so that properties > set here apply to all drill bits and to client applications. > 2. By setting environment variables prior to launching Drill. See the list > below. Use this to customize properties per drill-bit, such as for setting > port numbers. This option is useful when launching Drill from a tool such as > Mesos or YARN. > 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the > list below. This script is intended to be unique to each node and is another > way to customize properties for this one node. > 4. In Drill 1.7 and later, the administrator can set Drill configuration > options directly on the launch command as shown below. This option is also > useful when launching Drill from a
[jira] [Updated] (DRILL-4587) Document Drillbit launch options
[ https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4587: --- Description: Drill provides the drillbit.sh script to launch Drill. When Drill is run in production environments, or when managed by a tool such as Mesos or YARN, customers have many options to customize the launch options. We should document this information as below. The user can configure Drill launch in one of four ways, depending on their needs. 1. Using the properties in drill-override.conf. Sets only startup and runtime properties. All drillbits should use a copy of the file so that properties set here apply to all drill bits and to client applications. 2. By setting environment variables prior to launching Drill. See the list below. Use this to customize properties per drill-bit, such as for setting port numbers. This option is useful when launching Drill from a tool such as Mesos or YARN. 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the list below. This script is intended to be unique to each node and is another way to customize properties for this one node. 4. In Drill 1.7 and later, the administrator can set Drill configuration options directly on the launch command as shown below. This option is also useful when launching Drill from a tool such as YARN or Mesos. Options are of the form: $ drillbit.sh start -Dvariable=value For example, to control the HTTP port: $ drillbit.sh start -Ddrill.exec.http.port=8099 Properties are of three types. 1. Launch-only properties: those that can be set only through environment variables (such as JAVA_HOME.) 2. Drill startup properties which can be set in the locations detailed below. 3. Drill runtime properties which are set in drill-override.conf also via SQL. Drill startup propeties can be set in a number of locations. Those listed later take precedence over those listed earlier. 1. Drill-override.conf as identified by DRILL_CONF_DIR or its default. 2. Set in the environment using DRILL_JAVA_OPTS or DRILL_DRILLBIT_JAVA_OPTS. 3. Set in drill-env.sh using the above two variables. 4. Set on the drill.bit command line as explained above. (Drill 1.7 and later.) You can see the actual set of properties used (from items 2-3 above) by using the "debug" command: $ drillbit.sh debug was: Drill provides the drillbit.sh script to launch Drill. When Drill is run in production environments, or when managed by a tool such as Mesos or YARN, customers have many options to customize the launch options. We should document this information as below. The user can configure Drill launch in one of four ways, depending on their needs. 1. Using the properties in drill-override.conf. Sets only startup and runtime properties. All drillbits should use a copy of the file so that properties set here apply to all drill bits and to client applications. 2. By setting environment variables prior to launching Drill. See the list below. Use this to customize properties per drill-bit, such as for setting port numbers. This option is useful when launching Drill from a tool such as Mesos or YARN. 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the list below. This script is intended to be unique to each node and is another way to customize properties for this one node. 4. In Drill 1.7 and later, the administrator can set Drill configuration options directly on the launch command as shown below. This option is also useful when launching Drill from a tool such as YARN or Mesos. Options are of the form: drillbit.sh start -Dvariable=value For example, to control the HTTP port: drillbit.sh start -Ddrill.exec.http.port=8099 > Document Drillbit launch options > > > Key: DRILL-4587 > URL: https://issues.apache.org/jira/browse/DRILL-4587 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Paul Rogers >Assignee: Bridget Bevens > > Drill provides the drillbit.sh script to launch Drill. When Drill is run in > production environments, or when managed by a tool such as Mesos or YARN, > customers have many options to customize the launch options. We should > document this information as below. > The user can configure Drill launch in one of four ways, depending on their > needs. > 1. Using the properties in drill-override.conf. Sets only startup and runtime > properties. All drillbits should use a copy of the file so that properties > set here apply to all drill bits and to client applications. > 2. By setting environment variables prior to launching Drill. See the list > below. Use this to customize properties per drill-bit, such as for setting > port numbers. This option is useful when launching Drill from a tool such as > Mesos or YARN. >
[jira] [Updated] (DRILL-4587) Document Drillbit launch options
[ https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4587: --- Description: Drill provides the drillbit.sh script to launch Drill. When Drill is run in production environments, or when managed by a tool such as Mesos or YARN, customers have many options to customize the launch options. We should document this information as below. The user can configure Drill launch in one of four ways, depending on their needs. 1. Using the properties in drill-override.conf. Sets only startup and runtime properties. All drillbits should use a copy of the file so that properties set here apply to all drill bits and to client applications. 2. By setting environment variables prior to launching Drill. See the list below. Use this to customize properties per drill-bit, such as for setting port numbers. This option is useful when launching Drill from a tool such as Mesos or YARN. 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the list below. This script is intended to be unique to each node and is another way to customize properties for this one node. 4. In Drill 1.7 and later, the administrator can set Drill configuration options directly on the launch command as shown below. This option is also useful when launching Drill from a tool such as YARN or Mesos. Options are of the form: $ drillbit.sh start -Dvariable=value For example, to control the HTTP port: $ drillbit.sh start -Ddrill.exec.http.port=8099 Properties are of three types. 1. Launch-only properties: those that can be set only through environment variables (such as JAVA_HOME.) 2. Drill startup properties which can be set in the locations detailed below. 3. Drill runtime properties which are set in drill-override.conf also via SQL. Drill startup propeties can be set in a number of locations. Those listed later take precedence over those listed earlier. 1. Drill-override.conf as identified by DRILL_CONF_DIR or its default. 2. Set in the environment using DRILL_JAVA_OPTS or DRILL_DRILLBIT_JAVA_OPTS. 3. Set in drill-env.sh using the above two variables. 4. Set on the drill.bit command line as explained above. (Drill 1.7 and later.) You can see the actual set of properties used (from items 2-3 above) by using the "debug" command (Drill 1.7 or later): $ drillbit.sh debug was: Drill provides the drillbit.sh script to launch Drill. When Drill is run in production environments, or when managed by a tool such as Mesos or YARN, customers have many options to customize the launch options. We should document this information as below. The user can configure Drill launch in one of four ways, depending on their needs. 1. Using the properties in drill-override.conf. Sets only startup and runtime properties. All drillbits should use a copy of the file so that properties set here apply to all drill bits and to client applications. 2. By setting environment variables prior to launching Drill. See the list below. Use this to customize properties per drill-bit, such as for setting port numbers. This option is useful when launching Drill from a tool such as Mesos or YARN. 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the list below. This script is intended to be unique to each node and is another way to customize properties for this one node. 4. In Drill 1.7 and later, the administrator can set Drill configuration options directly on the launch command as shown below. This option is also useful when launching Drill from a tool such as YARN or Mesos. Options are of the form: $ drillbit.sh start -Dvariable=value For example, to control the HTTP port: $ drillbit.sh start -Ddrill.exec.http.port=8099 Properties are of three types. 1. Launch-only properties: those that can be set only through environment variables (such as JAVA_HOME.) 2. Drill startup properties which can be set in the locations detailed below. 3. Drill runtime properties which are set in drill-override.conf also via SQL. Drill startup propeties can be set in a number of locations. Those listed later take precedence over those listed earlier. 1. Drill-override.conf as identified by DRILL_CONF_DIR or its default. 2. Set in the environment using DRILL_JAVA_OPTS or DRILL_DRILLBIT_JAVA_OPTS. 3. Set in drill-env.sh using the above two variables. 4. Set on the drill.bit command line as explained above. (Drill 1.7 and later.) You can see the actual set of properties used (from items 2-3 above) by using the "debug" command: $ drillbit.sh debug > Document Drillbit launch options > > > Key: DRILL-4587 > URL: https://issues.apache.org/jira/browse/DRILL-4587 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Paul Rogers >
[jira] [Assigned] (DRILL-4541) Make sure query planner does not generate operators with mixed convention trait
[ https://issues.apache.org/jira/browse/DRILL-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-4541: - Assignee: Jinfeng Ni > Make sure query planner does not generate operators with mixed convention > trait > --- > > Key: DRILL-4541 > URL: https://issues.apache.org/jira/browse/DRILL-4541 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > > Per the discussion [1] in the PR of DRILL-4531, we should fix the query > planner rules used in Drill planning, such that it will not generate Rels > with mixed convention trait. For instance, a LogicalFilter should only have > child with NONE convention; it should not have child with LOGICAL convention. > > The mixed Rels will cause planner either hang (as reported in DRILL-4531 and > DRILL-3257), or do wasted work by firing rules against the mixed Rels. > I think the reason that we have such mixed rels is we have different kinds of > rules, used in a single Volcano planning phase. > 1) Rule matchs base class Filter/Project, etc only. > 2) Rule matches LogicalFilter/LogicalProject, etc > 3) Rule matches DrillFilter/DrillProject, etc. > 3) Rule uses copy() method to generate a new Rel > 4) Rule uses RelFactory to generate a new Rel. > 5) convent rule, which convert from Calcite logical (NONE/Enumerable) to > Drill logical (LOGICAL) > For instance, ProjectMergeRule, which matches base Project, yet uses default > RelFactory, will match both LogicalProject and DrillProject, but produce > LogicalProject as outcome. That will cause the mixed rels. > 2 things we may consider to fix this: > 1) Separate the convent rules from the other transformation rules. Apply > convert rule first, then all the transformation rules match DrillLogical > only. > 2) Every rule that Drill uses, except for convert rules, should assert the > convention of input and output have the same convention. > [1] https://github.com/apache/drill/pull/444 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4587) Document Drillbit launch options
[ https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902 ] Paul Rogers edited comment on DRILL-4587 at 4/6/16 8:38 PM: User-settable environment variables: DRILL_HOME Drill home (defaults based on the location of the drillbit.sh script.) DRILL_CONF_DIR Alternate drill configuration directory that contains the drill-override.conf and drill-env.sh files. Default is $DRILL_HOME/conf DRILL_LOG_DIR Where log files are stored. Default is /var/log/drill if that exists, else $DRILL_HOME/log DRILL_PID_DIR The directory where Drill stores its Process ID (pid) file. $DRILL_HOME by default. DRILL_IDENT_STRING A string representing this instance of drillbit. $USER by default DRILL_NICENESS The scheduling priority for daemons. Defaults to 0. DRILL_STOP_TIMEOUT Used when stopping the Drill-bit. Grace period time, in seconds, after which the script forcibly kills the server if it has not stopped. Default 120 seconds. JAVA_HOME The java implementation to use. If not set, looks for java on the command pass and uses that location. DRILL_CLASSPATHExtra Java CLASSPATH entries for custom code. DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed to the system classpath. HADOOP_HOMEHadoop home HBASE_HOME HBase home DRILL_JAVA_OPTS Optional JVM arguments such as system property overides used by both the drillbit and client. DRILLBIT_JAVA_OPTS Optional JVM arguments specifically for the drillbit. SERVER_GC_OPTS todo was (Author: paul-rogers): User-settable environment variables: DRILL_CONF_DIR Alternate drill conf dir. Default is $DRILL_HOME/conf. DRILL_LOG_DIR Where log files are stored. Default is /var/log/drill if that exists, else $DRILL_HOME/log DRILL_PID_DIR The pid files are stored. /tmp by default. DRILL_IDENT_STRING A string representing this instance of drillbit. $USER by default DRILL_NICENESS The scheduling priority for daemons. Defaults to 0. DRILL_STOP_TIMEOUT Time, in seconds, after which we kill -9 the server if it has not stopped. Default 120 seconds. DRILL_HOME Drill home (defaults based on this script's path.) JAVA_HOME The java implementation to use. DRILL_CLASSPATHExtra Java CLASSPATH entries. DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed to the system classpath. HADOOP_HOMEHadoop home HBASE_HOME HBase home LOG_OPTS?? DRILL_JAVA_OPTS Optional JVM arguments such as system property overides used by both the drillbit and client. DRILLBIT_JAVA_OPTS Optional JVM arguments specifically for the drillbit. SERVER_GC_OPTS todo > Document Drillbit launch options > > > Key: DRILL-4587 > URL: https://issues.apache.org/jira/browse/DRILL-4587 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Paul Rogers >Assignee: Bridget Bevens > > Drill provides the drillbit.sh script to launch Drill. When Drill is run in > production environments, or when managed by a tool such as Mesos or YARN, > customers have many options to customize the launch options. We should > document this information as below. > The user can configure Drill launch in one of four ways, depending on their > needs. > 1. Using the properties in drill-override.conf. Sets only startup and runtime > properties. All drillbits should use a copy of the file so that properties > set here apply to all drill bits and to client applications. > 2. By setting environment variables prior to launching Drill. See the list > below. Use this to customize properties per drill-bit, such as for setting > port numbers. This option is useful when launching Drill from a tool such as > Mesos or YARN. > 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the > list below. This script is intended to be unique to each node and is another > way to customize properties for this one node. > 4. In Drill 1.7 and later, the administrator can set Drill configuration > options directly on the launch command as shown below. This option is also > useful when launching Drill from a tool such as YARN or Mesos. Options are of > the form: > drillbit.sh start -Dvariable=value > For example, to control the HTTP port: > drillbit.sh start -Ddrill.exec.http.port=8099 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4587) Document Drillbit launch options
[ https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Rumsby updated DRILL-4587: -- Assignee: Bridget Bevens > Document Drillbit launch options > > > Key: DRILL-4587 > URL: https://issues.apache.org/jira/browse/DRILL-4587 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Paul Rogers >Assignee: Bridget Bevens > > Drill provides the drillbit.sh script to launch Drill. When Drill is run in > production environments, or when managed by a tool such as Mesos or YARN, > customers have many options to customize the launch options. We should > document this information as below. > The user can configure Drill launch in one of four ways, depending on their > needs. > 1. Using the properties in drill-override.conf. Sets only startup and runtime > properties. All drillbits should use a copy of the file so that properties > set here apply to all drill bits and to client applications. > 2. By setting environment variables prior to launching Drill. See the list > below. Use this to customize properties per drill-bit, such as for setting > port numbers. This option is useful when launching Drill from a tool such as > Mesos or YARN. > 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the > list below. This script is intended to be unique to each node and is another > way to customize properties for this one node. > 4. In Drill 1.7 and later, the administrator can set Drill configuration > options directly on the launch command as shown below. This option is also > useful when launching Drill from a tool such as YARN or Mesos. Options are of > the form: > drillbit.sh start -Dvariable=value > For example, to control the HTTP port: > drillbit.sh start -Ddrill.exec.http.port=8099 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4587) Document Drillbit launch options
[ https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902 ] Paul Rogers edited comment on DRILL-4587 at 4/6/16 8:31 PM: User-settable environment variables: DRILL_CONF_DIR Alternate drill conf dir. Default is ${DRILL_HOME}/conf. DRILL_LOG_DIR Where log files are stored. Default is /var/log/drill if that exists, else $DRILL_HOME/log DRILL_PID_DIR The pid files are stored. /tmp by default. DRILL_IDENT_STRING A string representing this instance of drillbit. $USER by default DRILL_NICENESS The scheduling priority for daemons. Defaults to 0. DRILL_STOP_TIMEOUT Time, in seconds, after which we kill -9 the server if it has not stopped. Default 120 seconds. DRILL_HOME Drill home (defaults based on this script's path.) JAVA_HOME The java implementation to use. DRILL_CLASSPATHExtra Java CLASSPATH entries. DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed to the system classpath. HADOOP_HOMEHadoop home HBASE_HOME HBase home LOG_OPTS?? DRILL_JAVA_OPTS Optional JVM arguments such as system property overides used by both the drillbit and client. DRILLBIT_JAVA_OPTS Optional JVM arguments specifically for the drillbit. SERVER_GC_OPTS todo was (Author: paul-rogers): User-settable environment variables: DRILL_CONF_DIR Alternate drill conf dir. Default is ${DRILL_HOME}/conf. DRILL_LOG_DIR Where log files are stored. Default is /var/log/drill if that exists, else $DRILL_HOME/log DRILL_PID_DIR The pid files are stored. /tmp by default. DRILL_IDENT_STRING A string representing this instance of drillbit. $USER by default DRILL_NICENESS The scheduling priority for daemons. Defaults to 0. DRILL_STOP_TIMEOUT Time, in seconds, after which we kill -9 the server if it has not stopped. Default 120 seconds. DRILL_HOME Drill home (defaults based on this script's path.) JAVA_HOME The java implementation to use. DRILL_CLASSPATHExtra Java CLASSPATH entries. DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed to the system classpath. HADOOP_HOMEHadoop home HBASE_HOME HBase home LOG_OPTS?? DRILL_JAVA_OPTS Optional JVM arguments such as system property overides used by both the drillbit and client. DRILLBIT_JAVA_OPTS Optional JVM arguments specifically for the drillbit. SERVER_GC_OPTS todo > Document Drillbit launch options > > > Key: DRILL-4587 > URL: https://issues.apache.org/jira/browse/DRILL-4587 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Paul Rogers > > Drill provides the drillbit.sh script to launch Drill. When Drill is run in > production environments, or when managed by a tool such as Mesos or YARN, > customers have many options to customize the launch options. We should > document this information as below. > The user can configure Drill launch in one of four ways, depending on their > needs. > 1. Using the properties in drill-override.conf. Sets only startup and runtime > properties. All drillbits should use a copy of the file so that properties > set here apply to all drill bits and to client applications. > 2. By setting environment variables prior to launching Drill. See the list > below. Use this to customize properties per drill-bit, such as for setting > port numbers. This option is useful when launching Drill from a tool such as > Mesos or YARN. > 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the > list below. This script is intended to be unique to each node and is another > way to customize properties for this one node. > 4. In Drill 1.7 and later, the administrator can set Drill configuration > options directly on the launch command as shown below. This option is also > useful when launching Drill from a tool such as YARN or Mesos. Options are of > the form: > drillbit.sh start -Dvariable=value > For example, to control the HTTP port: > drillbit.sh start -Ddrill.exec.http.port=8099 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4587) Document Drillbit launch options
[ https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902 ] Paul Rogers edited comment on DRILL-4587 at 4/6/16 8:31 PM: User-settable environment variables: DRILL_CONF_DIR Alternate drill conf dir. Default is $DRILL_HOME/conf. DRILL_LOG_DIR Where log files are stored. Default is /var/log/drill if that exists, else $DRILL_HOME/log DRILL_PID_DIR The pid files are stored. /tmp by default. DRILL_IDENT_STRING A string representing this instance of drillbit. $USER by default DRILL_NICENESS The scheduling priority for daemons. Defaults to 0. DRILL_STOP_TIMEOUT Time, in seconds, after which we kill -9 the server if it has not stopped. Default 120 seconds. DRILL_HOME Drill home (defaults based on this script's path.) JAVA_HOME The java implementation to use. DRILL_CLASSPATHExtra Java CLASSPATH entries. DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed to the system classpath. HADOOP_HOMEHadoop home HBASE_HOME HBase home LOG_OPTS?? DRILL_JAVA_OPTS Optional JVM arguments such as system property overides used by both the drillbit and client. DRILLBIT_JAVA_OPTS Optional JVM arguments specifically for the drillbit. SERVER_GC_OPTS todo was (Author: paul-rogers): User-settable environment variables: DRILL_CONF_DIR Alternate drill conf dir. Default is ${DRILL_HOME}/conf. DRILL_LOG_DIR Where log files are stored. Default is /var/log/drill if that exists, else $DRILL_HOME/log DRILL_PID_DIR The pid files are stored. /tmp by default. DRILL_IDENT_STRING A string representing this instance of drillbit. $USER by default DRILL_NICENESS The scheduling priority for daemons. Defaults to 0. DRILL_STOP_TIMEOUT Time, in seconds, after which we kill -9 the server if it has not stopped. Default 120 seconds. DRILL_HOME Drill home (defaults based on this script's path.) JAVA_HOME The java implementation to use. DRILL_CLASSPATHExtra Java CLASSPATH entries. DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed to the system classpath. HADOOP_HOMEHadoop home HBASE_HOME HBase home LOG_OPTS?? DRILL_JAVA_OPTS Optional JVM arguments such as system property overides used by both the drillbit and client. DRILLBIT_JAVA_OPTS Optional JVM arguments specifically for the drillbit. SERVER_GC_OPTS todo > Document Drillbit launch options > > > Key: DRILL-4587 > URL: https://issues.apache.org/jira/browse/DRILL-4587 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Paul Rogers >Assignee: Bridget Bevens > > Drill provides the drillbit.sh script to launch Drill. When Drill is run in > production environments, or when managed by a tool such as Mesos or YARN, > customers have many options to customize the launch options. We should > document this information as below. > The user can configure Drill launch in one of four ways, depending on their > needs. > 1. Using the properties in drill-override.conf. Sets only startup and runtime > properties. All drillbits should use a copy of the file so that properties > set here apply to all drill bits and to client applications. > 2. By setting environment variables prior to launching Drill. See the list > below. Use this to customize properties per drill-bit, such as for setting > port numbers. This option is useful when launching Drill from a tool such as > Mesos or YARN. > 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the > list below. This script is intended to be unique to each node and is another > way to customize properties for this one node. > 4. In Drill 1.7 and later, the administrator can set Drill configuration > options directly on the launch command as shown below. This option is also > useful when launching Drill from a tool such as YARN or Mesos. Options are of > the form: > drillbit.sh start -Dvariable=value > For example, to control the HTTP port: > drillbit.sh start -Ddrill.exec.http.port=8099 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4587) Document Drillbit launch options
[ https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4587: --- Description: Drill provides the drillbit.sh script to launch Drill. When Drill is run in production environments, or when managed by a tool such as Mesos or YARN, customers have many options to customize the launch options. We should document this information as below. The user can configure Drill launch in one of four ways, depending on their needs. 1. Using the properties in drill-override.conf. Sets only startup and runtime properties. All drillbits should use a copy of the file so that properties set here apply to all drill bits and to client applications. 2. By setting environment variables prior to launching Drill. See the list below. Use this to customize properties per drill-bit, such as for setting port numbers. This option is useful when launching Drill from a tool such as Mesos or YARN. 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the list below. This script is intended to be unique to each node and is another way to customize properties for this one node. 4. In Drill 1.7 and later, the administrator can set Drill configuration options directly on the launch command as shown below. This option is also useful when launching Drill from a tool such as YARN or Mesos. Options are of the form: drillbit.sh start -Dvariable=value For example, to control the HTTP port: drillbit.sh start -Ddrill.exec.http.port=8099 was: Drill provides the drillbit.sh script to launch Drill. When Drill is run in production environments, or when managed by a tool such as Mesos or YARN, customers have many options to customize the launch options. We should document this information as below. The user can configure Drill launch in one of two ways, depending on version. $DRILL_HOME/conf/drill-env.sh allows the user to set environment variables that control the Drill launch. See the comment below for the list of these variables. In Drill 1.7 and later, the administrator can set Drill configuration options directly on the launch command line in the form: drillbit.sh start -Dvariable=value For example, to control the control port: drillbit.sh start -Ddrill.exec.http.port=8099 > Document Drillbit launch options > > > Key: DRILL-4587 > URL: https://issues.apache.org/jira/browse/DRILL-4587 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Paul Rogers > > Drill provides the drillbit.sh script to launch Drill. When Drill is run in > production environments, or when managed by a tool such as Mesos or YARN, > customers have many options to customize the launch options. We should > document this information as below. > The user can configure Drill launch in one of four ways, depending on their > needs. > 1. Using the properties in drill-override.conf. Sets only startup and runtime > properties. All drillbits should use a copy of the file so that properties > set here apply to all drill bits and to client applications. > 2. By setting environment variables prior to launching Drill. See the list > below. Use this to customize properties per drill-bit, such as for setting > port numbers. This option is useful when launching Drill from a tool such as > Mesos or YARN. > 3. By setting environment variables in $DRILL_HOME/conf/drill-env.sh. See the > list below. This script is intended to be unique to each node and is another > way to customize properties for this one node. > 4. In Drill 1.7 and later, the administrator can set Drill configuration > options directly on the launch command as shown below. This option is also > useful when launching Drill from a tool such as YARN or Mesos. Options are of > the form: > drillbit.sh start -Dvariable=value > For example, to control the HTTP port: > drillbit.sh start -Ddrill.exec.http.port=8099 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4588) Enable JMXReporter to Expose Metrics
[ https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam reassigned DRILL-4588: -- Assignee: Sudheesh Katkam > Enable JMXReporter to Expose Metrics > > > Key: DRILL-4588 > URL: https://issues.apache.org/jira/browse/DRILL-4588 > Project: Apache Drill > Issue Type: Bug >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam > > There is a static initialization order issue that needs to be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4588) Enable JMXReporter to Expose Metrics
Sudheesh Katkam created DRILL-4588: -- Summary: Enable JMXReporter to Expose Metrics Key: DRILL-4588 URL: https://issues.apache.org/jira/browse/DRILL-4588 Project: Apache Drill Issue Type: Bug Reporter: Sudheesh Katkam There is a static initialization order issue that needs to be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4587) Document Drillbit launch options
[ https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902 ] Paul Rogers edited comment on DRILL-4587 at 4/6/16 8:24 PM: User-settable environment variables: DRILL_CONF_DIR Alternate drill conf dir. Default is ${DRILL_HOME}/conf. DRILL_LOG_DIR Where log files are stored. Default is /var/log/drill if that exists, else $DRILL_HOME/log DRILL_PID_DIR The pid files are stored. /tmp by default. DRILL_IDENT_STRING A string representing this instance of drillbit. $USER by default DRILL_NICENESS The scheduling priority for daemons. Defaults to 0. DRILL_STOP_TIMEOUT Time, in seconds, after which we kill -9 the server if it has not stopped. Default 120 seconds. DRILL_HOME Drill home (defaults based on this script's path.) JAVA_HOME The java implementation to use. DRILL_CLASSPATHExtra Java CLASSPATH entries. DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed to the system classpath. HADOOP_HOMEHadoop home HBASE_HOME HBase home LOG_OPTS?? DRILL_JAVA_OPTS Optional JVM arguments such as system property overides used by both the drillbit and client. DRILLBIT_JAVA_OPTS Optional JVM arguments specifically for the drillbit. SERVER_GC_OPTS todo was (Author: paul-rogers): User-settable environment variables: DRILL_CONF_DIR Alternate drill conf dir. Default is ${DRILL_HOME}/conf. DRILL_LOG_DIR Where log files are stored. Default is /var/log/drill if that exists, else $DRILL_HOME/log DRILL_PID_DIR The pid files are stored. /tmp by default. DRILL_IDENT_STRING A string representing this instance of drillbit. $USER by default DRILL_NICENESS The scheduling priority for daemons. Defaults to 0. DRILL_STOP_TIMEOUT Time, in seconds, after which we kill -9 the server if it has not stopped. Default 120 seconds. DRILL_HOME Drill home (defaults based on this script's path.) JAVA_HOME The java implementation to use. DRILL_CLASSPATHExtra Java CLASSPATH entries. DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed to the system classpath. HADOOP_HOMEHadoop home HBASE_HOME HBase home LOG_OPTS?? DRILL_JAVA_OPTS Optional JVM arguments such as system property overides used by both the drillbit and client. DRILLBIT_JAVA_OPTS Optional JVM arguments specifically for the drillbit. SERVER_GC_OPTS todo > Document Drillbit launch options > > > Key: DRILL-4587 > URL: https://issues.apache.org/jira/browse/DRILL-4587 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Paul Rogers > > Drill provides the drillbit.sh script to launch Drill. When Drill is run in > production environments, or when managed by a tool such as Mesos or YARN, > customers have many options to customize the launch options. We should > document this information as below. > The user can configure Drill launch in one of two ways, depending on version. > $DRILL_HOME/conf/drill-env.sh allows the user to set environment variables > that control the Drill launch. See the comment below for the list of these > variables. > In Drill 1.7 and later, the administrator can set Drill configuration options > directly on the launch command line in the form: > drillbit.sh start -Dvariable=value > For example, to control the control port: > drillbit.sh start -Ddrill.exec.http.port=8099 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4587) Document Drillbit launch options
[ https://issues.apache.org/jira/browse/DRILL-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228902#comment-15228902 ] Paul Rogers commented on DRILL-4587: User-settable environment variables: DRILL_CONF_DIR Alternate drill conf dir. Default is ${DRILL_HOME}/conf. DRILL_LOG_DIR Where log files are stored. Default is /var/log/drill if that exists, else $DRILL_HOME/log DRILL_PID_DIR The pid files are stored. /tmp by default. DRILL_IDENT_STRING A string representing this instance of drillbit. $USER by default DRILL_NICENESS The scheduling priority for daemons. Defaults to 0. DRILL_STOP_TIMEOUT Time, in seconds, after which we kill -9 the server if it has not stopped. Default 120 seconds. DRILL_HOME Drill home (defaults based on this script's path.) JAVA_HOME The java implementation to use. DRILL_CLASSPATHExtra Java CLASSPATH entries. DRILL_CLASSPATH_PREFIX Extra Java CLASSPATH entries that should be prefixed to the system classpath. HADOOP_HOMEHadoop home HBASE_HOME HBase home LOG_OPTS?? DRILL_JAVA_OPTS Optional JVM arguments such as system property overides used by both the drillbit and client. DRILLBIT_JAVA_OPTS Optional JVM arguments specifically for the drillbit. SERVER_GC_OPTS todo > Document Drillbit launch options > > > Key: DRILL-4587 > URL: https://issues.apache.org/jira/browse/DRILL-4587 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Paul Rogers > > Drill provides the drillbit.sh script to launch Drill. When Drill is run in > production environments, or when managed by a tool such as Mesos or YARN, > customers have many options to customize the launch options. We should > document this information as below. > The user can configure Drill launch in one of two ways, depending on version. > $DRILL_HOME/conf/drill-env.sh allows the user to set environment variables > that control the Drill launch. See the comment below for the list of these > variables. > In Drill 1.7 and later, the administrator can set Drill configuration options > directly on the launch command line in the form: > drillbit.sh start -Dvariable=value > For example, to control the control port: > drillbit.sh start -Ddrill.exec.http.port=8099 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4587) Document Drillbit launch options
Paul Rogers created DRILL-4587: -- Summary: Document Drillbit launch options Key: DRILL-4587 URL: https://issues.apache.org/jira/browse/DRILL-4587 Project: Apache Drill Issue Type: Improvement Components: Documentation Reporter: Paul Rogers Drill provides the drillbit.sh script to launch Drill. When Drill is run in production environments, or when managed by a tool such as Mesos or YARN, customers have many options to customize the launch options. We should document this information as below. The user can configure Drill launch in one of two ways, depending on version. $DRILL_HOME/conf/drill-env.sh allows the user to set environment variables that control the Drill launch. See the comment below for the list of these variables. In Drill 1.7 and later, the administrator can set Drill configuration options directly on the launch command line in the form: drillbit.sh start -Dvariable=value For example, to control the control port: drillbit.sh start -Ddrill.exec.http.port=8099 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
[ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228891#comment-15228891 ] ASF GitHub Bot commented on DRILL-4539: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/462#discussion_r58761862 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java --- @@ -169,4 +176,223 @@ private static boolean containIdentity(List exps, } return true; } + + /** + * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, RexNode, List, List)}. Modified to rewrite + * the null equal join condition using IS NOT DISTINCT FROM operator. + * + * Splits out the equi-join components of a join condition, and returns + * what's left. For example, given the condition + * + * L.A = R.X AND L.B = L.C AND (L.D = 5 OR L.E = + * R.Y) + * + * returns + * + * + * leftKeys = {A} + * rightKeys = {X} + * rest = L.B = L.C AND (L.D = 5 OR L.E = R.Y) + * + * + * @param left left input to join + * @param right right input to join + * @param condition join condition + * @param leftKeys The ordinals of the fields from the left input which are + * equi-join keys + * @param rightKeys The ordinals of the fields from the right input which + * are equi-join keys + * @param joinOps List of equi-join operators (EQUALS or IS NOT DISTINCT FROM) used to join the left and right keys. + * @return remaining join filters that are not equijoins; may return a + * {@link RexLiteral} true, but never null + */ + public static RexNode splitJoinCondition( + RelNode left, + RelNode right, + RexNode condition, + List leftKeys, + List rightKeys, + List joinOps) { +final List nonEquiList = new ArrayList<>(); + +splitJoinCondition( +left.getRowType().getFieldCount(), +condition, +leftKeys, +rightKeys, +joinOps, +nonEquiList); + +return RexUtil.composeConjunction( +left.getCluster().getRexBuilder(), nonEquiList, false); + } + + /** + * Copied from {@link RelOptUtil#splitJoinCondition(int, RexNode, List, List, List)}. Modified to rewrite the null + * equal join condition using IS NOT DISTINCT FROM operator. + */ + private static void splitJoinCondition( --- End diff -- Rest looks good to me. +1. > Add support for Null Equality Joins > --- > > Key: DRILL-4539 > URL: https://issues.apache.org/jira/browse/DRILL-4539 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Tableau frequently generates queries similar to this: > {code} > SELECT `t0`.`city` AS `city`, > `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`, > `t0`.`state` AS `state`, > `t0`.`sum_stars_ok` AS `sum_stars_ok` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > SUM(`business`.`stars`) AS `sum_stars_ok` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state` > ) `t0` > INNER JOIN ( > SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`, > `t1`.`city` AS `city`, > `t1`.`state` AS `state` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > `business`.`business_id` AS `business_id`, > SUM(`business`.`stars`) AS `X_measure__A` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state`, > `business`.`business_id` > ) `t1` > GROUP BY `t1`.`city`, > `t1`.`state` > ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND > (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` > IS NULL) AND (`t2`.`state` IS NULL > {code} > If you look at the join condition, you'll note that the join condition is an > equality condition which also allows null=null. We should add a planning > rewrite rule and execution join option to allow null equality so that we > don't treat this as a cartesian join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4564) Document start-up properties hierarchy
[ https://issues.apache.org/jira/browse/DRILL-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4564: --- Summary: Document start-up properties hierarchy (was: Add documentation detail regarding start-up properties hierarchy) > Document start-up properties hierarchy > -- > > Key: DRILL-4564 > URL: https://issues.apache.org/jira/browse/DRILL-4564 > Project: Apache Drill > Issue Type: Improvement > Components: Documentation >Reporter: Paul Rogers >Assignee: Bridget Bevens >Priority: Minor > > We’ve been having a lively discussion about config options. We might want to > summarize the discussion in DRILL-4543. Current text: > At the core of the file hierarchy is drill-default.conf. This file is > overridden by one or more drill-module.conf files, which are overridden by > the drill-override.conf file that you define. > Possible revision: > At the bottom of the hierarchy are the default files that Drill itself > provides. The first is drill-default.conf. This file is overridden by one or > more drill-module.conf files provided by Drill’s internal modules. These are > overridden by the drill-override.conf file that you define. Finally, you can > provide overrides on each Drill-bit using system properties of the form > -Dname=value passed on the command line: > ./drillbit.sh start -Dname=value -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
[ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228889#comment-15228889 ] ASF GitHub Bot commented on DRILL-4539: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/462#discussion_r58761755 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java --- @@ -169,4 +176,223 @@ private static boolean containIdentity(List exps, } return true; } + + /** + * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, RexNode, List, List)}. Modified to rewrite + * the null equal join condition using IS NOT DISTINCT FROM operator. + * + * Splits out the equi-join components of a join condition, and returns + * what's left. For example, given the condition + * + * L.A = R.X AND L.B = L.C AND (L.D = 5 OR L.E = + * R.Y) + * + * returns + * + * + * leftKeys = {A} + * rightKeys = {X} + * rest = L.B = L.C AND (L.D = 5 OR L.E = R.Y) + * + * + * @param left left input to join + * @param right right input to join + * @param condition join condition + * @param leftKeys The ordinals of the fields from the left input which are + * equi-join keys + * @param rightKeys The ordinals of the fields from the right input which + * are equi-join keys + * @param joinOps List of equi-join operators (EQUALS or IS NOT DISTINCT FROM) used to join the left and right keys. + * @return remaining join filters that are not equijoins; may return a + * {@link RexLiteral} true, but never null + */ + public static RexNode splitJoinCondition( + RelNode left, + RelNode right, + RexNode condition, + List leftKeys, + List rightKeys, + List joinOps) { +final List nonEquiList = new ArrayList<>(); + +splitJoinCondition( +left.getRowType().getFieldCount(), +condition, +leftKeys, +rightKeys, +joinOps, +nonEquiList); + +return RexUtil.composeConjunction( +left.getCluster().getRexBuilder(), nonEquiList, false); + } + + /** + * Copied from {@link RelOptUtil#splitJoinCondition(int, RexNode, List, List, List)}. Modified to rewrite the null + * equal join condition using IS NOT DISTINCT FROM operator. + */ + private static void splitJoinCondition( --- End diff -- Can you confirm if this rewrite does *not* do the conversion if the join condition happens to involve columns coming from not just 2 tables but from 3 tables ? e.g if the user accidentally gives: SELECT * FROM t1, t2, t3 WHERE t1.a = t2.a OR (t1.b is null and t3.b is null) > Add support for Null Equality Joins > --- > > Key: DRILL-4539 > URL: https://issues.apache.org/jira/browse/DRILL-4539 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Tableau frequently generates queries similar to this: > {code} > SELECT `t0`.`city` AS `city`, > `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`, > `t0`.`state` AS `state`, > `t0`.`sum_stars_ok` AS `sum_stars_ok` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > SUM(`business`.`stars`) AS `sum_stars_ok` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state` > ) `t0` > INNER JOIN ( > SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`, > `t1`.`city` AS `city`, > `t1`.`state` AS `state` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > `business`.`business_id` AS `business_id`, > SUM(`business`.`stars`) AS `X_measure__A` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state`, > `business`.`business_id` > ) `t1` > GROUP BY `t1`.`city`, > `t1`.`state` > ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND > (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` > IS NULL) AND (`t2`.`state` IS NULL > {code} > If you look at the join condition, you'll note that the join condition is an > equality condition which also allows null=null. We should add a planning > rewrite rule and execution join option to allow null equality so that we > don't treat this as a cartesian join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
[ https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228866#comment-15228866 ] ASF GitHub Bot commented on DRILL-4577: --- Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/461#discussion_r58759667 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java --- @@ -72,4 +80,76 @@ public String getTypeName() { return HiveStoragePluginConfig.NAME; } + @Override + public List> getTablesByNames(final List tableNames) { +final String schemaName = getName(); +final List > tableNameToTable = Lists.newArrayList(); +List tables; +// Retries once if the first call to fetch the metadata fails +synchronized(mClient) { + final List tableNamesWithAuth = Lists.newArrayList(); + for(String tableName : tableNames) { +try { + if(mClient.tableExists(schemaName, tableName)) { --- End diff -- If fetching partitions is causing the major delay, then we can load them lazily only if we need them in DrillHiveTable. > Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in > --- > > Key: DRILL-4577 > URL: https://issues.apache.org/jira/browse/DRILL-4577 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > Fix For: 1.7.0 > > > A query such as > {code} > select * from INFORMATION_SCHEMA.`TABLES` > {code} > is converted as calls to fetch all tables from storage plugins. > When users have Hive, the calls to hive metadata storage would be: > 1) get_table > 2) get_partitions > However, the information regarding partitions is not used in this type of > queries. Beside, a more efficient way is to fetch tables is to use > get_multi_table call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4581) Various problems in the Drill startup scripts
[ https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4581: --- Description: Noticed the following in drillbit.sh: 1) Comment: DRILL_LOG_DIRWhere log files are stored. PWD by default. Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log 2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default. Code: DRILL_PID_DIR=$DRILL_HOME 3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is both unnecessary and prints a less informative message than the drill-config.sh check. Suggestion: Remove the JAVA_HOME check in drillbit.sh. 4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? Recommended: export JAVA_HOME from drill-config.sh. 5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and set the default value. Drill-config.sh defaults to /var/log/drill, or if that fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not handle the case where that directory is not writable. Suggested: remove the check in drillbit.sh. 6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching sqlline.log, but does not delete that file, leaving a bogus, empty client log file on the drillbit server. Recommendation: use bash commands instead. 7) The implementation of the above check is a bit awkward. It has a fallback case with somewhat awkward logic. Clean this up. 8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if it does not exist. Recommended: decide on a single choice, implement it in drill-config.sh. 9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults it to $DRILL_HOME/conf. This can lead to subtle errors. If I use drillbit.sh --config /misspelled/path where I mistype the path, I won't get an error, I get the default config, which may not at all be what I want to run. Recommendation: if the value of DRILL_CONF_DRILL is passed into the script (as a variable or via --config), then that directory must exist. Else, use the default. 10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left over from the original Hadoop script that the Drill script was based upon. Recomendation: export only in the case that HADOOP_HOME is set for cygwin. 11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a different message (to stdout) if the version is wrong. Recommendation: use the same format (and stderr) for both. 12) Similarly, other Java checks later in the script produce messages to stdout, not stderr. 13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies that it is executable. The script then throws away what we just found. Then, drill-bit.sh tries to recreate this information as: JAVA=$JAVA_HOME/bin/java This is wrong in two ways: 1) it ignores the actual java location and assumes it, and 2) it does not handle the java.exe case that drill-config.sh carefully worked out. Recommendation: export JAVA from drill-config.sh and remove the above line from drillbit.sh. 14) drillbit.sh presumably takes extra arguments like this: drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 -Dvar2=value2 -Dvar3=value3 The -D bit allows the user to override config variables at the command line. But, the scripts don't use the values. A) drill-config.sh consumes --config /my/conf/dir after consuming the leading arguments: while [ $# -gt 1 ]; do if [ "--config" = "$1" ]; then shift confdir=$1 shift DRILL_CONF_DIR=$confdir else # Presume we are at end of options and break break fi done B) drill-bit.sh will discard the var1: startStopStatus=$1 <-- grabs "start" shift command=drillbit shift <-- Consumes -Dvar1=value1 C) Remaining values passed back into drillbit.sh: args=$@ nohup $thiscmd internal_start $command $args D) Second invocation discards -Dvar2=value2 as described above. E) Remaining values are passed to runbit: "$DRILL_HOME"/bin/runbit $command "$@" start F) Where they again pass though drill-config. (Allowing us to do: drillbit.sh --config /first/conf --config /second/conf which is asking for trouble) G) And, the remaining arguments are simply not used: exec $JAVA -Dlog.path=$DRILLBIT_LOG_PATH -Dlog.query.path=$DRILLBIT_QUERY_LOG_PATH $DRILL_ALL_JAVA_OPTS -cp $CP org.apache.drill.exec.server.Drillbit 15) The checking of command-line args in drillbit.sh is wrong: # if no args specified, show usage if [ $# -lt 1 ]; then echo $usage exit 1 fi ... . "$bin"/drill-config.sh But, note, that drill-config.sh handles: drillbit.sh --config /conf/dir Consuming
[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
[ https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228850#comment-15228850 ] ASF GitHub Bot commented on DRILL-4577: --- Github user hsuanyi commented on a diff in the pull request: https://github.com/apache/drill/pull/461#discussion_r58758201 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java --- @@ -72,4 +80,76 @@ public String getTypeName() { return HiveStoragePluginConfig.NAME; } + @Override + public List> getTablesByNames(final List tableNames) { +final String schemaName = getName(); +final List > tableNameToTable = Lists.newArrayList(); +List tables; +// Retries once if the first call to fetch the metadata fails +synchronized(mClient) { + final List tableNamesWithAuth = Lists.newArrayList(); + for(String tableName : tableNames) { +try { + if(mClient.tableExists(schemaName, tableName)) { --- End diff -- Of course, eliminating the first one is important too. I am still investigating that. > Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in > --- > > Key: DRILL-4577 > URL: https://issues.apache.org/jira/browse/DRILL-4577 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > Fix For: 1.7.0 > > > A query such as > {code} > select * from INFORMATION_SCHEMA.`TABLES` > {code} > is converted as calls to fetch all tables from storage plugins. > When users have Hive, the calls to hive metadata storage would be: > 1) get_table > 2) get_partitions > However, the information regarding partitions is not used in this type of > queries. Beside, a more efficient way is to fetch tables is to use > get_multi_table call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
[ https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228846#comment-15228846 ] ASF GitHub Bot commented on DRILL-4577: --- Github user hsuanyi commented on a diff in the pull request: https://github.com/apache/drill/pull/461#discussion_r58758011 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java --- @@ -72,4 +80,76 @@ public String getTypeName() { return HiveStoragePluginConfig.NAME; } + @Override + public List> getTablesByNames(final List tableNames) { +final String schemaName = getName(); +final List > tableNameToTable = Lists.newArrayList(); +List tables; +// Retries once if the first call to fetch the metadata fails +synchronized(mClient) { + final List tableNamesWithAuth = Lists.newArrayList(); + for(String tableName : tableNames) { +try { + if(mClient.tableExists(schemaName, tableName)) { --- End diff -- There are two parts which makes the query slow. One follows from your point; The other is fetching partitions which turned out not used. [1] [1] https://github.com/apache/drill/blob/245da9790813569c5da9404e0fc5e45cc88e22bb/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L236 > Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in > --- > > Key: DRILL-4577 > URL: https://issues.apache.org/jira/browse/DRILL-4577 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > Fix For: 1.7.0 > > > A query such as > {code} > select * from INFORMATION_SCHEMA.`TABLES` > {code} > is converted as calls to fetch all tables from storage plugins. > When users have Hive, the calls to hive metadata storage would be: > 1) get_table > 2) get_partitions > However, the information regarding partitions is not used in this type of > queries. Beside, a more efficient way is to fetch tables is to use > get_multi_table call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4586) Create CLIENT ErrorType
Sudheesh Katkam created DRILL-4586: -- Summary: Create CLIENT ErrorType Key: DRILL-4586 URL: https://issues.apache.org/jira/browse/DRILL-4586 Project: Apache Drill Issue Type: Improvement Reporter: Sudheesh Katkam To display client errors with nice messages, we use "system error". However system error which is not meant to be used when we want to display a proper error message. System errors are meant for unexpected errors that don't have a "nice" error message yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4581) Various problems in the Drill startup scripts
[ https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4581: --- Description: Noticed the following in drillbit.sh: 1) Comment: DRILL_LOG_DIRWhere log files are stored. PWD by default. Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log 2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default. Code: DRILL_PID_DIR=$DRILL_HOME 3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is both unnecessary and prints a less informative message than the drill-config.sh check. Suggestion: Remove the JAVA_HOME check in drillbit.sh. 4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? Recommended: export JAVA_HOME from drill-config.sh. 5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and set the default value. Drill-config.sh defaults to /var/log/drill, or if that fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not handle the case where that directory is not writable. Suggested: remove the check in drillbit.sh. 6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching sqlline.log, but does not delete that file, leaving a bogus, empty client log file on the drillbit server. Recommendation: use bash commands instead. 7) The implementation of the above check is a bit awkward. It has a fallback case with somewhat awkward logic. Clean this up. 8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if it does not exist. Recommended: decide on a single choice, implement it in drill-config.sh. 9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults it to $DRILL_HOME/conf. This can lead to subtle errors. If I use drillbit.sh --config /misspelled/path where I mistype the path, I won't get an error, I get the default config, which may not at all be what I want to run. Recommendation: if the value of DRILL_CONF_DRILL is passed into the script (as a variable or via --config), then that directory must exist. Else, use the default. 10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left over from the original Hadoop script that the Drill script was based upon. Recomendation: export only in the case that HADOOP_HOME is set for cygwin. 11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a different message (to stdout) if the version is wrong. Recommendation: use the same format (and stderr) for both. 12) Similarly, other Java checks later in the script produce messages to stdout, not stderr. 13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies that it is executable. The script then throws away what we just found. Then, drill-bit.sh tries to recreate this information as: JAVA=$JAVA_HOME/bin/java This is wrong in two ways: 1) it ignores the actual java location and assumes it, and 2) it does not handle the java.exe case that drill-config.sh carefully worked out. Recommendation: export JAVA from drill-config.sh and remove the above line from drillbit.sh. 14) drillbit.sh presumably takes extra arguments like this: drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 -Dvar2=value2 -Dvar3=value3 The -D bit allows the user to override config variables at the command line. But, the scripts don't use the values. A) drill-config.sh consumes --config /my/conf/dir after consuming the leading arguments: while [ $# -gt 1 ]; do if [ "--config" = "$1" ]; then shift confdir=$1 shift DRILL_CONF_DIR=$confdir else # Presume we are at end of options and break break fi done B) drill-bit.sh will discard the var1: startStopStatus=$1 <-- grabs "start" shift command=drillbit shift <-- Consumes -Dvar1=value1 C) Remaining values passed back into drillbit.sh: args=$@ nohup $thiscmd internal_start $command $args D) Second invocation discards -Dvar2=value2 as described above. E) Remaining values are passed to runbit: "$DRILL_HOME"/bin/runbit $command "$@" start F) Where they again pass though drill-config. (Allowing us to do: drillbit.sh --config /first/conf --config /second/conf which is asking for trouble) G) And, the remaining arguments are simply not used: exec $JAVA -Dlog.path=$DRILLBIT_LOG_PATH -Dlog.query.path=$DRILLBIT_QUERY_LOG_PATH $DRILL_ALL_JAVA_OPTS -cp $CP org.apache.drill.exec.server.Drillbit 15) The checking of command-line args in drillbit.sh is wrong: # if no args specified, show usage if [ $# -lt 1 ]; then echo $usage exit 1 fi ... . "$bin"/drill-config.sh But, note, that drill-config.sh handles: drillbit.sh --config /conf/dir Consuming
[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
[ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228838#comment-15228838 ] ASF GitHub Bot commented on DRILL-4539: --- Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/462#discussion_r58757405 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java --- @@ -407,11 +342,94 @@ public void testMergeLOJNullableBothInputsOrderedDescNullsLastVsAscNullsLast() t + " ORDER BY 1 ASC NULLS LAST ) t2 " + "USING ( key )", TEST_RES_PATH, TEST_RES_PATH); -final int expectedRecordCount = 6; +testHelper(query, 6, false, true); + } + + @Test + public void withDistinctFromJoinConditionHashJoin() throws Exception { +final String query = "SELECT * FROM " + +"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " + +"cp.`jsonInput/nullableOrdered2.json` t2 " + +"ON t1.key IS NOT DISTINCT FROM t2.key AND t1.data is NOT null"; +nullEqualJoinHelper(query); + } + + @Test + public void withDistinctFromJoinConditionMergeJoin() throws Exception { +try { + test("alter session set `planner.enable_hashjoin` = false"); + final String query = "SELECT * FROM " + + "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " + + "cp.`jsonInput/nullableOrdered2.json` t2 " + + "ON t1.key IS NOT DISTINCT FROM t2.key"; + nullEqualJoinHelper(query); +} finally { + test("alter session set `planner.enable_hashjoin` = true"); +} + } + + @Test + public void withNullEqualHashJoin() throws Exception { +final String query = "SELECT * FROM " + +"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " + +"cp.`jsonInput/nullableOrdered2.json` t2 " + +"ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)"; +nullEqualJoinHelper(query); + } -enableJoin(false, true); -final int actualRecordCount = testSql(query); -assertEquals("Number of output rows", expectedRecordCount, actualRecordCount); + @Test + public void withNullEqualMergeJoin() throws Exception { +try { + test("alter session set `planner.enable_hashjoin` = false"); + final String query = "SELECT * FROM " + + "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " + + "cp.`jsonInput/nullableOrdered2.json` t2 " + + "ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)"; + nullEqualJoinHelper(query); +} finally { + test("alter session set `planner.enable_hashjoin` = true"); +} + } + + public void nullEqualJoinHelper(final String query) throws Exception { +testBuilder() +.sqlQuery(query) +.unOrdered() +.baselineColumns("key", "data", "data0", "key0") +.baselineValues(null, "L_null_1", "R_null_1", null) +.baselineValues(null, "L_null_2", "R_null_1", null) +.baselineValues("A", "L_A_1", "R_A_1", "A") +.baselineValues("A", "L_A_2", "R_A_1", "A") +.baselineValues(null, "L_null_1", "R_null_2", null) +.baselineValues(null, "L_null_2", "R_null_2", null) +.baselineValues(null, "L_null_1", "R_null_3", null) +.baselineValues(null, "L_null_2", "R_null_3", null) +.go(); } + @Test + public void withNullEqualAdditionFilter() throws Exception { --- End diff -- Sure. I will update the patch with new tests. > Add support for Null Equality Joins > --- > > Key: DRILL-4539 > URL: https://issues.apache.org/jira/browse/DRILL-4539 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Tableau frequently generates queries similar to this: > {code} > SELECT `t0`.`city` AS `city`, > `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`, > `t0`.`state` AS `state`, > `t0`.`sum_stars_ok` AS `sum_stars_ok` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > SUM(`business`.`stars`) AS `sum_stars_ok` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state` > ) `t0` > INNER JOIN ( > SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`, > `t1`.`city` AS `city`, > `t1`.`state` AS `state` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > `business`.`business_id` AS `business_id`, > SUM(`business`.`stars`) AS `X_measure__A` >
[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
[ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228833#comment-15228833 ] ASF GitHub Bot commented on DRILL-4539: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/462#discussion_r58757060 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java --- @@ -407,11 +342,94 @@ public void testMergeLOJNullableBothInputsOrderedDescNullsLastVsAscNullsLast() t + " ORDER BY 1 ASC NULLS LAST ) t2 " + "USING ( key )", TEST_RES_PATH, TEST_RES_PATH); -final int expectedRecordCount = 6; +testHelper(query, 6, false, true); + } + + @Test + public void withDistinctFromJoinConditionHashJoin() throws Exception { +final String query = "SELECT * FROM " + +"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " + +"cp.`jsonInput/nullableOrdered2.json` t2 " + +"ON t1.key IS NOT DISTINCT FROM t2.key AND t1.data is NOT null"; +nullEqualJoinHelper(query); + } + + @Test + public void withDistinctFromJoinConditionMergeJoin() throws Exception { +try { + test("alter session set `planner.enable_hashjoin` = false"); + final String query = "SELECT * FROM " + + "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " + + "cp.`jsonInput/nullableOrdered2.json` t2 " + + "ON t1.key IS NOT DISTINCT FROM t2.key"; + nullEqualJoinHelper(query); +} finally { + test("alter session set `planner.enable_hashjoin` = true"); +} + } + + @Test + public void withNullEqualHashJoin() throws Exception { +final String query = "SELECT * FROM " + +"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " + +"cp.`jsonInput/nullableOrdered2.json` t2 " + +"ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)"; +nullEqualJoinHelper(query); + } -enableJoin(false, true); -final int actualRecordCount = testSql(query); -assertEquals("Number of output rows", expectedRecordCount, actualRecordCount); + @Test + public void withNullEqualMergeJoin() throws Exception { +try { + test("alter session set `planner.enable_hashjoin` = false"); + final String query = "SELECT * FROM " + + "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " + + "cp.`jsonInput/nullableOrdered2.json` t2 " + + "ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)"; + nullEqualJoinHelper(query); +} finally { + test("alter session set `planner.enable_hashjoin` = true"); +} + } + + public void nullEqualJoinHelper(final String query) throws Exception { +testBuilder() +.sqlQuery(query) +.unOrdered() +.baselineColumns("key", "data", "data0", "key0") +.baselineValues(null, "L_null_1", "R_null_1", null) +.baselineValues(null, "L_null_2", "R_null_1", null) +.baselineValues("A", "L_A_1", "R_A_1", "A") +.baselineValues("A", "L_A_2", "R_A_1", "A") +.baselineValues(null, "L_null_1", "R_null_2", null) +.baselineValues(null, "L_null_2", "R_null_2", null) +.baselineValues(null, "L_null_1", "R_null_3", null) +.baselineValues(null, "L_null_2", "R_null_3", null) +.go(); } + @Test + public void withNullEqualAdditionFilter() throws Exception { --- End diff -- Could you also do similar test with the join condition in the WHERE clause instead of ON clause ? i.e something like: SELECT * FROM t1, t2 WHERE t1.a = t2.a OR (t1.a is null and t2.a is null) For such cases, Calcite filter pushdown into join needs to be applied first. > Add support for Null Equality Joins > --- > > Key: DRILL-4539 > URL: https://issues.apache.org/jira/browse/DRILL-4539 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Tableau frequently generates queries similar to this: > {code} > SELECT `t0`.`city` AS `city`, > `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`, > `t0`.`state` AS `state`, > `t0`.`sum_stars_ok` AS `sum_stars_ok` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > SUM(`business`.`stars`) AS `sum_stars_ok` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state` > ) `t0` > INNER JOIN ( > SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`, > `t1`.`city`
[jira] [Commented] (DRILL-4575) alias not working on field.
[ https://issues.apache.org/jira/browse/DRILL-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228817#comment-15228817 ] Hugo Bellomusto commented on DRILL-4575: It sounds different, in DRILL-4572 error happens when using functions. Here, I use a function to make it work. > alias not working on field. > --- > > Key: DRILL-4575 > URL: https://issues.apache.org/jira/browse/DRILL-4575 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.6.0 > Environment: Apache drill 1.6.0 > java 1.7.0_40 >Reporter: Hugo Bellomusto > > {code:sql} > create table dfs.tmp.a_field as > select 'hello' field from (VALUES(1)); > select field my_field from dfs.tmp.a_field; > {code} > The result is: > ||field|| > |hello| > When should be: > ||my_field|| > |hello| > {noformat:title=physical plan} > 00-00Screen : rowType = RecordType(ANY field): rowcount = 1.0, cumulative > cost = {1.1 rows, 1.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1635 > 00-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=hdfs://10.70.168.69:8020/tmp/a_field]], > selectionRoot=hdfs://10.70.168.69:8020/tmp/a_field, numFiles=1, > usedMetadataFile=false, columns=[`field`]]]) : rowType = RecordType(ANY > field): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 1634 > {noformat} > But, this works well: > {code:sql} > select concat(field, ' world') my_field from dfs.tmp.a_field; > {code} > returns: > ||my_field|| > |hello world| > Additional info: > {code:sql} > select * from sys.options where name like '%parquet%' or string_val like > '%parquet%'; > {code} > ||name||string_val| > |store.format|parquet| > |store.parquet.block-size| | > |store.parquet.compression|snappy| > |store.parquet.dictionary.page-size| | > |store.parquet.enable_dictionary_encoding| | > |store.parquet.page-size| | > |store.parquet.use_new_reader| | > |store.parquet.vector_fill_check_threshold| | > |store.parquet.vector_fill_threshold| | -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1170) YARN support for Drill
[ https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228797#comment-15228797 ] Jacques Nadeau commented on DRILL-1170: --- Hey Paul & Billie, if the Slider community co-implemented this with the Drill folk, it would probably allow Slider to support more use cases and bring us to a shared approach rather than two separate codebases. Do you think that anyone from the Slider community would be able to spend substantial time against this to address the Drill needs? > YARN support for Drill > -- > > Key: DRILL-1170 > URL: https://issues.apache.org/jira/browse/DRILL-1170 > Project: Apache Drill > Issue Type: New Feature >Reporter: Neeraja >Assignee: Paul Rogers > Fix For: Future > > > This is a tracking item to make Drill work with YARN. > Below are few requirements/needs to consider. > - Drill should run as an YARN based application, side by side with other YARN > enabled applications (on same nodes or different nodes). Both memory and CPU > resources of Drill should be controlled in this mechanism. > - As an YARN enabled application, Drill resource consumption should be > adaptive to the load on the cluster. For ex: When there is no load on the > Drill , Drill should consume no resources on the cluster. As the load on > Drill increases, resources permitting, usage should grow proportionally. > - Low latency is a key requirement for Apache Drill along with support for > multiple users (concurrency in 100s-1000s). This should be supported when run > as YARN application as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner
[ https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228790#comment-15228790 ] ASF GitHub Bot commented on DRILL-4576: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/466#discussion_r58754467 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner; + +import java.util.Collection; + +import org.apache.calcite.plan.RelOptPlanner; + +/** + * A callback that StoragePlugins can initialize to allow further configuration + * of the Planner at initialization time. Examples could be to allow adding lattices, + * materializations or additional traits to the planner that will be used in + * planning. + */ +public abstract class PlannerCallback { + + /** + * Method that will be called before a planner is used to further configure the planner. + * @param planner The planner to be configured. + */ + public abstract void initializePlanner(RelOptPlanner planner); + + + public static PlannerCallback merge(Collection callbacks){ +return new PlannerCallbackCollection(callbacks); + } + + private static class PlannerCallbackCollection extends PlannerCallback{ +private Collection callbacks; + +private PlannerCallbackCollection(Collection callbacks){ + this.callbacks = callbacks; --- End diff -- should a immutable copy be used instead of the caller's collection? > Add StoragePlugin API to register materialization into planner > -- > > Key: DRILL-4576 > URL: https://issues.apache.org/jira/browse/DRILL-4576 > Project: Apache Drill > Issue Type: Bug >Reporter: Laurent Goujon >Assignee: Jacques Nadeau > > There's no currently a good way to register materializations into Drill > planner. Calcite's MaterializationService.instance() would be the way to go, > but the registration happens in > {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is > not called by Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4132) Ability to submit simple type of physical plan directly to EndPoint DrillBit for execution
[ https://issues.apache.org/jira/browse/DRILL-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228789#comment-15228789 ] ASF GitHub Bot commented on DRILL-4132: --- Github user yufeldman commented on a diff in the pull request: https://github.com/apache/drill/pull/368#discussion_r58754414 --- Diff: protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java --- @@ -133,6 +133,10 @@ private RpcChannel(int index, int value) { * PHYSICAL = 3; */ PHYSICAL(2, 3), +/** + * EXECUTIONAL = 4; + */ +EXECUTIONAL(3, 4), --- End diff -- sure again :). > Ability to submit simple type of physical plan directly to EndPoint DrillBit > for execution > -- > > Key: DRILL-4132 > URL: https://issues.apache.org/jira/browse/DRILL-4132 > Project: Apache Drill > Issue Type: New Feature > Components: Execution - Flow, Execution - RPC, Query Planning & > Optimization >Reporter: Yuliya Feldman >Assignee: Yuliya Feldman > > Today Drill Query execution is optimistic and stateful (at least due to data > exchanges) - if any of the stages of query execution fails whole query fails. > If query is just simple scan, filter push down and project where no data > exchange happens between DrillBits there is no need to fail whole query when > one DrillBit fails, as minor fragments running on that DrillBit can be rerun > on the other DrillBit. There are probably multiple ways to achieve this. This > JIRA is to open discussion on: > 1. agreement that we need to support above use case > 2. means of achieving it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4132) Ability to submit simple type of physical plan directly to EndPoint DrillBit for execution
[ https://issues.apache.org/jira/browse/DRILL-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228779#comment-15228779 ] ASF GitHub Bot commented on DRILL-4132: --- Github user yufeldman commented on a diff in the pull request: https://github.com/apache/drill/pull/368#discussion_r58753681 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/SimpleParallelizerMultiPlans.java --- @@ -0,0 +1,222 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.fragment; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.util.DrillStringUtils; +import org.apache.drill.exec.ops.QueryContext; +import org.apache.drill.exec.physical.base.Exchange; +import org.apache.drill.exec.physical.base.FragmentRoot; +import org.apache.drill.exec.physical.base.PhysicalOperator; +import org.apache.drill.exec.planner.PhysicalPlanReader; +import org.apache.drill.exec.planner.fragment.Materializer.IndexedFragmentNode; +import org.apache.drill.exec.proto.BitControl.PlanFragment; +import org.apache.drill.exec.proto.BitControl.QueryContextInformation; +import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint; +import org.apache.drill.exec.proto.ExecProtos.FragmentHandle; +import org.apache.drill.exec.proto.UserBitShared.QueryId; +import org.apache.drill.exec.rpc.user.UserSession; +import org.apache.drill.exec.server.options.OptionList; +import org.apache.drill.exec.work.QueryWorkUnit; +import org.apache.drill.exec.work.foreman.ForemanSetupException; + +import com.fasterxml.jackson.core.JsonProcessingException; +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; + +/** + * SimpleParallelizerMultiPlans class is an extension to SimpleParallelizer + * to help with getting PlanFragments for split plan. + * Split plan is essentially ability to create multiple physical plans from a single logical plan + * to be able to run them separately. + * Moving functionality specific to splitting the plan to this class + * allows not to pollute parent class with non-authentic functionality + * + */ +public class SimpleParallelizerMultiPlans extends SimpleParallelizer { + + public SimpleParallelizerMultiPlans(QueryContext context) { +super(context); + } + + /** + * Create multiple physical plans from original query planning, it will allow execute them eventually independently + * @param options + * @param foremanNode + * @param queryId + * @param activeEndpoints + * @param reader + * @param rootFragment + * @param session + * @param queryContextInfo + * @return + * @throws ExecutionSetupException + */ + public List getSplitFragments(OptionList options, DrillbitEndpoint foremanNode, QueryId queryId, + Collection activeEndpoints, PhysicalPlanReader reader, Fragment rootFragment, + UserSession session, QueryContextInformation queryContextInfo) throws ExecutionSetupException { + +final PlanningSet planningSet = getFragmentsHelper(activeEndpoints, rootFragment); + +return generateWorkUnits( +options, foremanNode, queryId, reader, rootFragment, planningSet, session, queryContextInfo); + } + + /** + * Split plan into multiple plans based on parallelization + * Ideally it is applicable only to plans with two major fragments: Screen and UnionExchange + * But there could be cases where we can remove even multiple exchanges like in case of "order by" + * End goal is to get single major fragment: Screen with chain that ends up with a single minor fragment + * from Leaf Exchange. This way each plan can run independently without any exchange involvement +
[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner
[ https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228777#comment-15228777 ] ASF GitHub Bot commented on DRILL-4576: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/466#discussion_r58753568 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner; + +import java.util.Collection; + +import org.apache.calcite.plan.RelOptPlanner; + +/** + * A callback that StoragePlugins can initialize to allow further configuration + * of the Planner at initialization time. Examples could be to allow adding lattices, + * materializations or additional traits to the planner that will be used in + * planning. + */ +public abstract class PlannerCallback { + + /** + * Method that will be called before a planner is used to further configure the planner. + * @param planner The planner to be configured. + */ + public abstract void initializePlanner(RelOptPlanner planner); + + + public static PlannerCallback merge(Collection callbacks){ +return new PlannerCallbackCollection(callbacks); + } + + private static class PlannerCallbackCollection extends PlannerCallback{ +private Collection callbacks; --- End diff -- final > Add StoragePlugin API to register materialization into planner > -- > > Key: DRILL-4576 > URL: https://issues.apache.org/jira/browse/DRILL-4576 > Project: Apache Drill > Issue Type: Bug >Reporter: Laurent Goujon >Assignee: Jacques Nadeau > > There's no currently a good way to register materializations into Drill > planner. Calcite's MaterializationService.instance() would be the way to go, > but the registration happens in > {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is > not called by Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
[ https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228774#comment-15228774 ] ASF GitHub Bot commented on DRILL-4577: --- Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/461#discussion_r58753354 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java --- @@ -72,4 +80,76 @@ public String getTypeName() { return HiveStoragePluginConfig.NAME; } + @Override + public List> getTablesByNames(final List tableNames) { +final String schemaName = getName(); +final List > tableNameToTable = Lists.newArrayList(); +List tables; +// Retries once if the first call to fetch the metadata fails +synchronized(mClient) { + final List tableNamesWithAuth = Lists.newArrayList(); + for(String tableName : tableNames) { +try { + if(mClient.tableExists(schemaName, tableName)) { --- End diff -- Here you are making a RPC call for every table. I thought for perf reasons we wanted to avoid the RPC call per table and instead use ```getTableObjectsByName``` to get all tables data in one RPC call. How does this patch improve the perf? > Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in > --- > > Key: DRILL-4577 > URL: https://issues.apache.org/jira/browse/DRILL-4577 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > Fix For: 1.7.0 > > > A query such as > {code} > select * from INFORMATION_SCHEMA.`TABLES` > {code} > is converted as calls to fetch all tables from storage plugins. > When users have Hive, the calls to hive metadata storage would be: > 1) get_table > 2) get_partitions > However, the information regarding partitions is not used in this type of > queries. Beside, a more efficient way is to fetch tables is to use > get_multi_table call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1170) YARN support for Drill
[ https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228760#comment-15228760 ] Matt Pollock commented on DRILL-1170: - Any progress update? My organization won't support use of Drill until this is done. > YARN support for Drill > -- > > Key: DRILL-1170 > URL: https://issues.apache.org/jira/browse/DRILL-1170 > Project: Apache Drill > Issue Type: New Feature >Reporter: Neeraja >Assignee: Paul Rogers > Fix For: Future > > > This is a tracking item to make Drill work with YARN. > Below are few requirements/needs to consider. > - Drill should run as an YARN based application, side by side with other YARN > enabled applications (on same nodes or different nodes). Both memory and CPU > resources of Drill should be controlled in this mechanism. > - As an YARN enabled application, Drill resource consumption should be > adaptive to the load on the cluster. For ex: When there is no load on the > Drill , Drill should consume no resources on the cluster. As the load on > Drill increases, resources permitting, usage should grow proportionally. > - Low latency is a key requirement for Apache Drill along with support for > multiple users (concurrency in 100s-1000s). This should be supported when run > as YARN application as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
[ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228757#comment-15228757 ] ASF GitHub Bot commented on DRILL-4539: --- Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/462#discussion_r58751970 --- Diff: exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java --- @@ -215,6 +192,36 @@ public void eval() { } } + <#-- IS_DISTINCT_FROM function --> + @FunctionTemplate(names = {"is_distinct_from", "is distinct from" }, --- End diff -- I added tests for each category of template code path (primitive type, decimal type and interval type) in TestIsDistinctFromFunctions.java > Add support for Null Equality Joins > --- > > Key: DRILL-4539 > URL: https://issues.apache.org/jira/browse/DRILL-4539 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Tableau frequently generates queries similar to this: > {code} > SELECT `t0`.`city` AS `city`, > `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`, > `t0`.`state` AS `state`, > `t0`.`sum_stars_ok` AS `sum_stars_ok` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > SUM(`business`.`stars`) AS `sum_stars_ok` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state` > ) `t0` > INNER JOIN ( > SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`, > `t1`.`city` AS `city`, > `t1`.`state` AS `state` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > `business`.`business_id` AS `business_id`, > SUM(`business`.`stars`) AS `X_measure__A` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state`, > `business`.`business_id` > ) `t1` > GROUP BY `t1`.`city`, > `t1`.`state` > ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND > (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` > IS NULL) AND (`t2`.`state` IS NULL > {code} > If you look at the join condition, you'll note that the join condition is an > equality condition which also allows null=null. We should add a planning > rewrite rule and execution join option to allow null equality so that we > don't treat this as a cartesian join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4577) Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
[ https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228751#comment-15228751 ] ASF GitHub Bot commented on DRILL-4577: --- Github user hsuanyi commented on a diff in the pull request: https://github.com/apache/drill/pull/461#discussion_r58751518 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java --- @@ -72,4 +80,56 @@ public String getTypeName() { return HiveStoragePluginConfig.NAME; } + @Override + public void visitTables(final RecordGenerator recordGenerator, final String schemaPath) { +final List tableNames = Lists.newArrayList(getTableNames()); +List tables; +// Retries once if the first call to fetch the metadata fails +synchronized(mClient) { + try { +tables = mClient.getTableObjectsByName(getName(), tableNames); --- End diff -- @vkorukanti Thanks for pointing this out. Regardless of the permission, getTableObjectsByName will return the requested tables. Thus, as in [1], I used mClient.tableExists() to check the permission. [1]https://github.com/apache/drill/pull/461/files#diff-bb5d8a385888df1dacc85fc011acd94bR93 > Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in > --- > > Key: DRILL-4577 > URL: https://issues.apache.org/jira/browse/DRILL-4577 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > Fix For: 1.7.0 > > > A query such as > {code} > select * from INFORMATION_SCHEMA.`TABLES` > {code} > is converted as calls to fetch all tables from storage plugins. > When users have Hive, the calls to hive metadata storage would be: > 1) get_table > 2) get_partitions > However, the information regarding partitions is not used in this type of > queries. Beside, a more efficient way is to fetch tables is to use > get_multi_table call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner
[ https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228750#comment-15228750 ] ASF GitHub Bot commented on DRILL-4576: --- Github user laurentgo commented on the pull request: https://github.com/apache/drill/pull/466#issuecomment-206487918 Patch overall looks good to me (except maybe the abstract class vs interface stuff) > Add StoragePlugin API to register materialization into planner > -- > > Key: DRILL-4576 > URL: https://issues.apache.org/jira/browse/DRILL-4576 > Project: Apache Drill > Issue Type: Bug >Reporter: Laurent Goujon >Assignee: Jacques Nadeau > > There's no currently a good way to register materializations into Drill > planner. Calcite's MaterializationService.instance() would be the way to go, > but the registration happens in > {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is > not called by Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner
[ https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228747#comment-15228747 ] ASF GitHub Bot commented on DRILL-4576: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/466#discussion_r58751431 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner; + +import java.util.Collection; + +import org.apache.calcite.plan.RelOptPlanner; + +/** + * A callback that StoragePlugins can initialize to allow further configuration + * of the Planner at initialization time. Examples could be to allow adding lattices, + * materializations or additional traits to the planner that will be used in + * planning. + */ +public abstract class PlannerCallback { + + /** + * Method that will be called before a planner is used to further configure the planner. + * @param planner The planner to be configured. + */ + public abstract void initializePlanner(RelOptPlanner planner); --- End diff -- really minor thing, but the name sounds strange compared to what the function is supposed to do? what about `onInitialization(RelOptPlanner planner)` or simply `apply(RelOptPlanner planner)` > Add StoragePlugin API to register materialization into planner > -- > > Key: DRILL-4576 > URL: https://issues.apache.org/jira/browse/DRILL-4576 > Project: Apache Drill > Issue Type: Bug >Reporter: Laurent Goujon >Assignee: Jacques Nadeau > > There's no currently a good way to register materializations into Drill > planner. Calcite's MaterializationService.instance() would be the way to go, > but the registration happens in > {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is > not called by Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
[ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228746#comment-15228746 ] ASF GitHub Bot commented on DRILL-4539: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/462#discussion_r58751332 --- Diff: exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java --- @@ -215,6 +192,36 @@ public void eval() { } } + <#-- IS_DISTINCT_FROM function --> + @FunctionTemplate(names = {"is_distinct_from", "is distinct from" }, --- End diff -- I am not opposed to having a native implementation of IS [NOT] DISTINCT FROM...clearly the generated code is more compact; however adding this new functions means we would need proper functional test coverage for various data types. Any thoughts regarding that ? > Add support for Null Equality Joins > --- > > Key: DRILL-4539 > URL: https://issues.apache.org/jira/browse/DRILL-4539 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Tableau frequently generates queries similar to this: > {code} > SELECT `t0`.`city` AS `city`, > `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`, > `t0`.`state` AS `state`, > `t0`.`sum_stars_ok` AS `sum_stars_ok` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > SUM(`business`.`stars`) AS `sum_stars_ok` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state` > ) `t0` > INNER JOIN ( > SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`, > `t1`.`city` AS `city`, > `t1`.`state` AS `state` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > `business`.`business_id` AS `business_id`, > SUM(`business`.`stars`) AS `X_measure__A` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state`, > `business`.`business_id` > ) `t1` > GROUP BY `t1`.`city`, > `t1`.`state` > ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND > (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` > IS NULL) AND (`t2`.`state` IS NULL > {code} > If you look at the join condition, you'll note that the join condition is an > equality condition which also allows null=null. We should add a planning > rewrite rule and execution join option to allow null equality so that we > don't treat this as a cartesian join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner
[ https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228717#comment-15228717 ] ASF GitHub Bot commented on DRILL-4576: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/466#discussion_r58749059 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner; + +import java.util.Collection; + +import org.apache.calcite.plan.RelOptPlanner; + +/** + * A callback that StoragePlugins can initialize to allow further configuration + * of the Planner at initialization time. Examples could be to allow adding lattices, + * materializations or additional traits to the planner that will be used in + * planning. + */ +public abstract class PlannerCallback { + + /** + * Method that will be called before a planner is used to further configure the planner. + * @param planner The planner to be configured. + */ + public abstract void initializePlanner(RelOptPlanner planner); + + + public static PlannerCallback merge(Collection callbacks){ --- End diff -- pure style comment (feel free to ignore). You sometimes put a space before a brace, sometimes not (I prefer with space personally as it feels more readable, and it is pretty standard across project, but my point is about consistency). > Add StoragePlugin API to register materialization into planner > -- > > Key: DRILL-4576 > URL: https://issues.apache.org/jira/browse/DRILL-4576 > Project: Apache Drill > Issue Type: Bug >Reporter: Laurent Goujon >Assignee: Jacques Nadeau > > There's no currently a good way to register materializations into Drill > planner. Calcite's MaterializationService.instance() would be the way to go, > but the registration happens in > {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is > not called by Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4576) Add StoragePlugin API to register materialization into planner
[ https://issues.apache.org/jira/browse/DRILL-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228716#comment-15228716 ] ASF GitHub Bot commented on DRILL-4576: --- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/466#discussion_r58749026 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner; + +import java.util.Collection; + +import org.apache.calcite.plan.RelOptPlanner; + +/** + * A callback that StoragePlugins can initialize to allow further configuration + * of the Planner at initialization time. Examples could be to allow adding lattices, + * materializations or additional traits to the planner that will be used in + * planning. + */ +public abstract class PlannerCallback { --- End diff -- why not an interface? > Add StoragePlugin API to register materialization into planner > -- > > Key: DRILL-4576 > URL: https://issues.apache.org/jira/browse/DRILL-4576 > Project: Apache Drill > Issue Type: Bug >Reporter: Laurent Goujon >Assignee: Jacques Nadeau > > There's no currently a good way to register materializations into Drill > planner. Calcite's MaterializationService.instance() would be the way to go, > but the registration happens in > {{org.apache.calcite.prepare.Prepare.PreparedResult#prepareSql()}}, which is > not called by Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition
[ https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228689#comment-15228689 ] John Omernik commented on DRILL-4530: - Let me add a big +1 to using protobuf for the cache. We could even include a simple jar with Drill to decode the protobuf to json for human reading/troubleshooting. If you consider how many times a human would read the metadata cache vs. how many times Drill will do it without human eyes, json does not provide any appreciable advantage over protobuf, especially if we just include a jar we can use to ready any protobuf file as json when needed. > Improve metadata cache performance for queries with single partition > - > > Key: DRILL-4530 > URL: https://issues.apache.org/jira/browse/DRILL-4530 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.7.0 > > > Consider two types of queries which are run with Parquet metadata caching: > {noformat} > query 1: > SELECT col FROM `A/B/C`; > query 2: > SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C'; > {noformat} > For a certain dataset, the query1 elapsed time is 1 sec whereas query2 > elapsed time is 9 sec even though both are accessing the same amount of data. > The user expectation is that they should perform roughly the same. The main > difference comes from reading the bigger metadata cache file at the root > level 'A' for query2 and then applying the partitioning filter. query1 reads > a much smaller metadata cache file at the subdirectory level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition
[ https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228674#comment-15228674 ] Aman Sinha commented on DRILL-4530: --- Yes, indeed the storage format of the metadata cache has been discussed a few times and various options are on the table (I believe [~parthc] has done some analysis of the options). Thanks for the experimentation using protobuf. The loading time improvements are quite impressive. The advantages of JSON (simple, human readable etc.) are outweighed by the performance tradeoffs. In any new option we consider, we must keep in mind the fast incremental refresh scenario - this feature is highly requested by all users who are using metadata cache. > Improve metadata cache performance for queries with single partition > - > > Key: DRILL-4530 > URL: https://issues.apache.org/jira/browse/DRILL-4530 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.7.0 > > > Consider two types of queries which are run with Parquet metadata caching: > {noformat} > query 1: > SELECT col FROM `A/B/C`; > query 2: > SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C'; > {noformat} > For a certain dataset, the query1 elapsed time is 1 sec whereas query2 > elapsed time is 9 sec even though both are accessing the same amount of data. > The user expectation is that they should perform roughly the same. The main > difference comes from reading the bigger metadata cache file at the root > level 'A' for query2 and then applying the partitioning filter. query1 reads > a much smaller metadata cache file at the subdirectory level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
[ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228651#comment-15228651 ] ASF GitHub Bot commented on DRILL-4539: --- Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/462#discussion_r58742916 --- Diff: exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java --- @@ -215,6 +192,36 @@ public void eval() { } } + <#-- IS_DISTINCT_FROM function --> + @FunctionTemplate(names = {"is_distinct_from", "is distinct from" }, --- End diff -- If adding new functions is a concern, I can make the ```RelOptUtil#splitJoinCondition``` to identify rewritten ```IS NOT DISTINCT FROM``` functions also. > Add support for Null Equality Joins > --- > > Key: DRILL-4539 > URL: https://issues.apache.org/jira/browse/DRILL-4539 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Tableau frequently generates queries similar to this: > {code} > SELECT `t0`.`city` AS `city`, > `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`, > `t0`.`state` AS `state`, > `t0`.`sum_stars_ok` AS `sum_stars_ok` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > SUM(`business`.`stars`) AS `sum_stars_ok` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state` > ) `t0` > INNER JOIN ( > SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`, > `t1`.`city` AS `city`, > `t1`.`state` AS `state` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > `business`.`business_id` AS `business_id`, > SUM(`business`.`stars`) AS `X_measure__A` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state`, > `business`.`business_id` > ) `t1` > GROUP BY `t1`.`city`, > `t1`.`state` > ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND > (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` > IS NULL) AND (`t2`.`state` IS NULL > {code} > If you look at the join condition, you'll note that the join condition is an > equality condition which also allows null=null. We should add a planning > rewrite rule and execution join option to allow null equality so that we > don't treat this as a cartesian join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
[ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228646#comment-15228646 ] ASF GitHub Bot commented on DRILL-4539: --- Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/462#discussion_r58742546 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java --- @@ -169,4 +176,223 @@ private static boolean containIdentity(List exps, } return true; } + + /** + * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, RexNode, List, List)}. Modified to rewrite --- End diff -- I will followup with a JIRA on Calcite project to see if we can push this change to Calcite. The function ```RelOptUtil#splitJoinCondition``` in the current form itself seems to have a problem/limitation. Currently it just returns the left and right join key indices, but doesn't return whether the condition is ```EQUAL``` or ```IS NOT DISTINCT FROM``` (it adds the key pair if they have either of these function in comparison). > Add support for Null Equality Joins > --- > > Key: DRILL-4539 > URL: https://issues.apache.org/jira/browse/DRILL-4539 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Tableau frequently generates queries similar to this: > {code} > SELECT `t0`.`city` AS `city`, > `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`, > `t0`.`state` AS `state`, > `t0`.`sum_stars_ok` AS `sum_stars_ok` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > SUM(`business`.`stars`) AS `sum_stars_ok` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state` > ) `t0` > INNER JOIN ( > SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`, > `t1`.`city` AS `city`, > `t1`.`state` AS `state` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > `business`.`business_id` AS `business_id`, > SUM(`business`.`stars`) AS `X_measure__A` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state`, > `business`.`business_id` > ) `t1` > GROUP BY `t1`.`city`, > `t1`.`state` > ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND > (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` > IS NULL) AND (`t2`.`state` IS NULL > {code} > If you look at the join condition, you'll note that the join condition is an > equality condition which also allows null=null. We should add a planning > rewrite rule and execution join option to allow null equality so that we > don't treat this as a cartesian join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
[ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228628#comment-15228628 ] ASF GitHub Bot commented on DRILL-4539: --- Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/462#discussion_r58741416 --- Diff: exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java --- @@ -215,6 +192,36 @@ public void eval() { } } + <#-- IS_DISTINCT_FROM function --> + @FunctionTemplate(names = {"is_distinct_from", "is distinct from" }, --- End diff -- I am not sure if there is way to differentiate between the function in join condition vs. function in project expr. I don't see any context info in DrillConvertletTable.get() method call. Also the generated code in rewritten case is too much. For following query: ```SELECT INT_col is not distinct from BIGINT_col as col, int_distinct_result FROM cp.`functions/distinct_from.json``` Without rewrite: https://gist.github.com/vkorukanti/e981058f985ed24e6c4ef6b47d670e0f With rewrite: https://gist.github.com/vkorukanti/d80aa2ba40c65c9215c38ed18b20a685 Sizes may differ after scalar replacement is done, but it is still too much code to simple ```is not distinct from``` function. > Add support for Null Equality Joins > --- > > Key: DRILL-4539 > URL: https://issues.apache.org/jira/browse/DRILL-4539 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Tableau frequently generates queries similar to this: > {code} > SELECT `t0`.`city` AS `city`, > `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`, > `t0`.`state` AS `state`, > `t0`.`sum_stars_ok` AS `sum_stars_ok` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > SUM(`business`.`stars`) AS `sum_stars_ok` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state` > ) `t0` > INNER JOIN ( > SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`, > `t1`.`city` AS `city`, > `t1`.`state` AS `state` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > `business`.`business_id` AS `business_id`, > SUM(`business`.`stars`) AS `X_measure__A` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state`, > `business`.`business_id` > ) `t1` > GROUP BY `t1`.`city`, > `t1`.`state` > ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND > (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` > IS NULL) AND (`t2`.`state` IS NULL > {code} > If you look at the join condition, you'll note that the join condition is an > equality condition which also allows null=null. We should add a planning > rewrite rule and execution join option to allow null equality so that we > don't treat this as a cartesian join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4571) Add link to local Drill logs from the web UI
[ https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228626#comment-15228626 ] Arina Ielchiieva commented on DRILL-4571: - Splitting this Jira into two. This Jira will deliver ability to view local logs from Web UI. https://issues.apache.org/jira/browse/DRILL-4585 will add ability to view logs from all drillibits. > Add link to local Drill logs from the web UI > > > Key: DRILL-4571 > URL: https://issues.apache.org/jira/browse/DRILL-4571 > Project: Apache Drill > Issue Type: Improvement >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Attachments: display_log.JPG, log_list.JPG > > > Now we have link to the profile from the web UI. > It will be handy for the users to have the link to local logs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4571) Add link to local Drill logs from the web UI
[ https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4571: Summary: Add link to local Drill logs from the web UI (was: Add link to the Drill log from the web UI) > Add link to local Drill logs from the web UI > > > Key: DRILL-4571 > URL: https://issues.apache.org/jira/browse/DRILL-4571 > Project: Apache Drill > Issue Type: Improvement >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Attachments: display_log.JPG, log_list.JPG > > > Now we have link to the profile from the web UI. > It will be handy for the users to have the link to the log as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4571) Add link to local Drill logs from the web UI
[ https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4571: Description: Now we have link to the profile from the web UI. It will be handy for the users to have the link to local logs as well. was: Now we have link to the profile from the web UI. It will be handy for the users to have the link to the log as well. > Add link to local Drill logs from the web UI > > > Key: DRILL-4571 > URL: https://issues.apache.org/jira/browse/DRILL-4571 > Project: Apache Drill > Issue Type: Improvement >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Attachments: display_log.JPG, log_list.JPG > > > Now we have link to the profile from the web UI. > It will be handy for the users to have the link to local logs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4585) Add ability to view logs from all drillbits in Web UI
Arina Ielchiieva created DRILL-4585: --- Summary: Add ability to view logs from all drillbits in Web UI Key: DRILL-4585 URL: https://issues.apache.org/jira/browse/DRILL-4585 Project: Apache Drill Issue Type: Improvement Reporter: Arina Ielchiieva Fix For: Future Currently we can only view logs in Web UI from local drillibit. It would be nice, if we could see logs from all active drillbits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4584) JDBC/ODBC Client IP in Drill audit logs
[ https://issues.apache.org/jira/browse/DRILL-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228592#comment-15228592 ] Vitalii Diravka commented on DRILL-4584: Is this an ip address of client machine with drill web console, drill shell or jdbc/odbc client? Or this is an ip address of foreman node? If the answer is IP address of foreman, And what is better to show hostname, IP address or ip:port. !https://drill.apache.org/docs/img/query-flow-client.png! > JDBC/ODBC Client IP in Drill audit logs > --- > > Key: DRILL-4584 > URL: https://issues.apache.org/jira/browse/DRILL-4584 > Project: Apache Drill > Issue Type: Improvement > Components: Client - JDBC, Client - ODBC >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.7.0 > > > Currently Drill audit logs - sqlline_queries.json and drillbit_queries.json > provide information about client username who fired the query . It will be > good to also have the client IP from where the query was fired . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4584) JDBC/ODBC Client IP in Drill audit logs
Vitalii Diravka created DRILL-4584: -- Summary: JDBC/ODBC Client IP in Drill audit logs Key: DRILL-4584 URL: https://issues.apache.org/jira/browse/DRILL-4584 Project: Apache Drill Issue Type: Improvement Components: Client - JDBC, Client - ODBC Reporter: Vitalii Diravka Assignee: Vitalii Diravka Priority: Minor Fix For: 1.7.0 Currently Drill audit logs - sqlline_queries.json and drillbit_queries.json provide information about client username who fired the query . It will be good to also have the client IP from where the query was fired . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3842) JVM dies if drill attempts to read too many files in the directory that blow up heap
[ https://issues.apache.org/jira/browse/DRILL-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim reassigned DRILL-3842: --- Assignee: Deneche A. Hakim > JVM dies if drill attempts to read too many files in the directory that blow > up heap > - > > Key: DRILL-3842 > URL: https://issues.apache.org/jira/browse/DRILL-3842 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.1.0, 1.2.0 >Reporter: Victoria Markman >Assignee: Deneche A. Hakim >Priority: Critical > > Run {{select count(*) from t1}} where t1 directory consists of 1.9 million > little parquet files. The outcome: drillbit is dead and out of working set. > 1. Client never got response back from the server > 2. drillbit.log > {code} > 2015-09-25 17:56:55,935 [29fa756f-894d-0340-3661-b925bff0fe11:foreman] INFO > o.a.d.exec.store.parquet.Metadata - Took 47999 ms to get file statuses > 2015-09-25 18:43:19,871 [BitServer-4] INFO > o.a.d.exec.rpc.control.ControlServer - RPC connection /10.10.88.135:31011 > <--> /10.10.88.135:51675 (control server) timed out. Timeout was set to 300 > seconds. Closing connection. > 2015-09-25 18:50:06,026 [BitServer-3] INFO > o.a.d.exec.rpc.control.ControlClient - Channel closed /10.10.88.135:51675 > <--> /10.10.88.135:31011. > 2015-09-25 18:50:06,032 [UserServer-1] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.88.135:31010 <--> /10.10.88.133:51612 (user client). > Closing connection. > java.lang.OutOfMemoryError: Java heap space > {code} > drillbit.out > {code} > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "main-SendThread(atsqa4-133.qa.lab:5181)" > Exception in thread "WorkManager.StatusThread" java.lang.OutOfMemoryError: > Java heap space > 2015-09-25 18:53:52 > Full thread dump OpenJDK 64-Bit Server VM (24.65-b04 mixed mode): > {code} > jstack > {code} > [Fri Sep 25 18:53:29 ] # jstack 63205 > 63205: Unable to open socket file: target process not responding or HotSpot > VM not loaded > The -F option can be used when the target process is not responding > {code} > jstack -F > {code} > Attaching to process ID 63205, please wait... > Debugger attached successfully. > Server compiler detected. > JVM version is 24.65-b04 > java.lang.RuntimeException: Unable to deduce type of thread from address > 0x04093800 (expected type JavaThread, CompilerThread, ServiceThread, > JvmtiAgentThread, or SurrogateLockerThread) > at > sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162) > at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150) > at > sun.jvm.hotspot.runtime.DeadlockDetector.createThreadTable(DeadlockDetector.java:149) > at > sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:56) > at > sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39) > at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52) > at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) > at sun.jvm.hotspot.tools.JStack.run(JStack.java:60) > at sun.jvm.hotspot.tools.Tool.start(Tool.java:221) > at sun.jvm.hotspot.tools.JStack.main(JStack.java:86) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.tools.jstack.JStack.runJStackTool(JStack.java:136) > at sun.tools.jstack.JStack.main(JStack.java:102) > Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for > type of address 0x04093800 > at > sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62) > at > sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80) > at > sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158) > ... 15 more > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.tools.jstack.JStack.runJStackTool(JStack.java:136) > at sun.tools.jstack.JStack.main(JStack.java:102) > Caused by: java.lang.RuntimeException: Unable to deduce type of
[jira] [Commented] (DRILL-3842) JVM dies if drill attempts to read too many files in the directory that blow up heap
[ https://issues.apache.org/jira/browse/DRILL-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228573#comment-15228573 ] Deneche A. Hakim commented on DRILL-3842: - Although I was able to reproduce the issue on 1.2.0, it's no longer occurring in the latest master. It still takes more than 30mn to plan the query and the heap usage grows to 4GB on the Foreman node, but the query succeeds. I suspect reading the parquet metadata cache for all 2M files is the cause, I will investigate if this is indeed the case > JVM dies if drill attempts to read too many files in the directory that blow > up heap > - > > Key: DRILL-3842 > URL: https://issues.apache.org/jira/browse/DRILL-3842 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.1.0, 1.2.0 >Reporter: Victoria Markman >Priority: Critical > > Run {{select count(*) from t1}} where t1 directory consists of 1.9 million > little parquet files. The outcome: drillbit is dead and out of working set. > 1. Client never got response back from the server > 2. drillbit.log > {code} > 2015-09-25 17:56:55,935 [29fa756f-894d-0340-3661-b925bff0fe11:foreman] INFO > o.a.d.exec.store.parquet.Metadata - Took 47999 ms to get file statuses > 2015-09-25 18:43:19,871 [BitServer-4] INFO > o.a.d.exec.rpc.control.ControlServer - RPC connection /10.10.88.135:31011 > <--> /10.10.88.135:51675 (control server) timed out. Timeout was set to 300 > seconds. Closing connection. > 2015-09-25 18:50:06,026 [BitServer-3] INFO > o.a.d.exec.rpc.control.ControlClient - Channel closed /10.10.88.135:51675 > <--> /10.10.88.135:31011. > 2015-09-25 18:50:06,032 [UserServer-1] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.88.135:31010 <--> /10.10.88.133:51612 (user client). > Closing connection. > java.lang.OutOfMemoryError: Java heap space > {code} > drillbit.out > {code} > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "main-SendThread(atsqa4-133.qa.lab:5181)" > Exception in thread "WorkManager.StatusThread" java.lang.OutOfMemoryError: > Java heap space > 2015-09-25 18:53:52 > Full thread dump OpenJDK 64-Bit Server VM (24.65-b04 mixed mode): > {code} > jstack > {code} > [Fri Sep 25 18:53:29 ] # jstack 63205 > 63205: Unable to open socket file: target process not responding or HotSpot > VM not loaded > The -F option can be used when the target process is not responding > {code} > jstack -F > {code} > Attaching to process ID 63205, please wait... > Debugger attached successfully. > Server compiler detected. > JVM version is 24.65-b04 > java.lang.RuntimeException: Unable to deduce type of thread from address > 0x04093800 (expected type JavaThread, CompilerThread, ServiceThread, > JvmtiAgentThread, or SurrogateLockerThread) > at > sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162) > at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150) > at > sun.jvm.hotspot.runtime.DeadlockDetector.createThreadTable(DeadlockDetector.java:149) > at > sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:56) > at > sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39) > at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52) > at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) > at sun.jvm.hotspot.tools.JStack.run(JStack.java:60) > at sun.jvm.hotspot.tools.Tool.start(Tool.java:221) > at sun.jvm.hotspot.tools.JStack.main(JStack.java:86) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.tools.jstack.JStack.runJStackTool(JStack.java:136) > at sun.tools.jstack.JStack.main(JStack.java:102) > Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for > type of address 0x04093800 > at > sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62) > at > sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80) > at > sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158) > ... 15 more > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at >
[jira] [Commented] (DRILL-4139) Exception while trying to prune partition. java.lang.UnsupportedOperationException: Unsupported type: BIT
[ https://issues.apache.org/jira/browse/DRILL-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228516#comment-15228516 ] Johannes Zillmann commented on DRILL-4139: -- Having the same issue with drill-1.6.0! > Exception while trying to prune partition. > java.lang.UnsupportedOperationException: Unsupported type: BIT > - > > Key: DRILL-4139 > URL: https://issues.apache.org/jira/browse/DRILL-4139 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.3.0 > Environment: 4 node cluster on CentOS >Reporter: Khurram Faraaz >Assignee: Aman Sinha > > Exception while trying to prune partition. > java.lang.UnsupportedOperationException: Unsupported type: BIT > is seen in drillbit.log after Functional run on 4 node cluster. > Drill 1.3.0 sys.version => d61bb83a8 > {code} > 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning > class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2 > 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze > filter tree: 0 ms > 2015-11-27 03:12:19,810 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] WARN > o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune > partition. > java.lang.UnsupportedOperationException: Unsupported type: BIT > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:479) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:235) > ~[drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) > [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808) > [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8] > at > org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) > [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8] > at > org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303) > [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:184) > [drill-java-exec-1.3.0.jar:1.3.0] > at > org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905) > [drill-java-exec-1.3.0.jar:1.3.0] > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244) > [drill-java-exec-1.3.0.jar:1.3.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
[ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228503#comment-15228503 ] ASF GitHub Bot commented on DRILL-4539: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/462#discussion_r58730151 --- Diff: exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java --- @@ -215,6 +192,36 @@ public void eval() { } } + <#-- IS_DISTINCT_FROM function --> + @FunctionTemplate(names = {"is_distinct_from", "is distinct from" }, --- End diff -- @vkorukanti, I want to clarify...if the query only had a join condition with IS_NOT_DISTINCT_FROM, I would think it should work just with your convertlet changes, since both HashJoin and MergeJoin handle this type of join condition. Is the reason you had to implement the full comparator codegen to handle more general types of comparisons ? e.g in the SELECT list if I say 'SELECT a IS NOT DISTINCT FROM b' ?Suppose we had a convertlet that only preserved the IS (NOT) DISTINCT FROM join condition, and defaulted to the Calcite rewrite using the CASE expression, then we would not have to implement the full comparator. > Add support for Null Equality Joins > --- > > Key: DRILL-4539 > URL: https://issues.apache.org/jira/browse/DRILL-4539 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Tableau frequently generates queries similar to this: > {code} > SELECT `t0`.`city` AS `city`, > `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`, > `t0`.`state` AS `state`, > `t0`.`sum_stars_ok` AS `sum_stars_ok` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > SUM(`business`.`stars`) AS `sum_stars_ok` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state` > ) `t0` > INNER JOIN ( > SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`, > `t1`.`city` AS `city`, > `t1`.`state` AS `state` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > `business`.`business_id` AS `business_id`, > SUM(`business`.`stars`) AS `X_measure__A` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state`, > `business`.`business_id` > ) `t1` > GROUP BY `t1`.`city`, > `t1`.`state` > ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND > (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` > IS NULL) AND (`t2`.`state` IS NULL > {code} > If you look at the join condition, you'll note that the join condition is an > equality condition which also allows null=null. We should add a planning > rewrite rule and execution join option to allow null equality so that we > don't treat this as a cartesian join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
[ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228484#comment-15228484 ] ASF GitHub Bot commented on DRILL-4539: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/462#discussion_r58728331 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java --- @@ -169,4 +176,223 @@ private static boolean containIdentity(List exps, } return true; } + + /** + * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, RexNode, List, List)}. Modified to rewrite --- End diff -- Agree that we ideally should leverage the Calcite code..especially since this method is pretty heavily used and modified periodically so keeping Drill's version of this method in sync will be difficult. > Add support for Null Equality Joins > --- > > Key: DRILL-4539 > URL: https://issues.apache.org/jira/browse/DRILL-4539 > Project: Apache Drill > Issue Type: Improvement >Reporter: Jacques Nadeau >Assignee: Venki Korukanti > > Tableau frequently generates queries similar to this: > {code} > SELECT `t0`.`city` AS `city`, > `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`, > `t0`.`state` AS `state`, > `t0`.`sum_stars_ok` AS `sum_stars_ok` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > SUM(`business`.`stars`) AS `sum_stars_ok` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state` > ) `t0` > INNER JOIN ( > SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`, > `t1`.`city` AS `city`, > `t1`.`state` AS `state` > FROM ( > SELECT `business`.`city` AS `city`, > `business`.`state` AS `state`, > `business`.`business_id` AS `business_id`, > SUM(`business`.`stars`) AS `X_measure__A` > FROM `mongo.academic`.`business` `business` > GROUP BY `business`.`city`, > `business`.`state`, > `business`.`business_id` > ) `t1` > GROUP BY `t1`.`city`, > `t1`.`state` > ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND > (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` > IS NULL) AND (`t2`.`state` IS NULL > {code} > If you look at the join condition, you'll note that the join condition is an > equality condition which also allows null=null. We should add a planning > rewrite rule and execution join option to allow null equality so that we > don't treat this as a cartesian join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3894) Directory functions (MaxDir, MinDir ..) should have optional filename parameter
[ https://issues.apache.org/jira/browse/DRILL-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227995#comment-15227995 ] ASF GitHub Bot commented on DRILL-3894: --- GitHub user vdiravka opened a pull request: https://github.com/apache/drill/pull/467 DRILL-3894: Upgrade functions MaxDir, MinDir... Optional filename parameter Functions MaxDir, MinDir, iMaxDir, iMinDir with one (schema) parameter were added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vdiravka/drill DRILL-3894 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/467.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #467 commit 966d76a06f82dcb265849b90bcff8ce8a770f4ec Author: Vitalii DiravkaDate: 2016-04-05T15:07:29Z DRILL-3894: Upgrade functions MaxDir, MinDir... Optional filename parameter - added functions MaxDir, MinDir, iMaxDir, iMinDir with one (schema) parameter. > Directory functions (MaxDir, MinDir ..) should have optional filename > parameter > --- > > Key: DRILL-3894 > URL: https://issues.apache.org/jira/browse/DRILL-3894 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.2.0 >Reporter: Neeraja >Assignee: Vitalii Diravka > > https://drill.apache.org/docs/query-directory-functions/ > The directory functions documented above should provide ability to have > second parameter(file name) as optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4543) Advertise Drill-bit ports, status, capabilities in ZooKeeper
[ https://issues.apache.org/jira/browse/DRILL-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227857#comment-15227857 ] Paul Rogers commented on DRILL-4543: As it turns out, the Drill startup scripts have a number of bugs that prevent passing of -Dname=value system properties on the command line. See DRILL-4581. > Advertise Drill-bit ports, status, capabilities in ZooKeeper > > > Key: DRILL-4543 > URL: https://issues.apache.org/jira/browse/DRILL-4543 > Project: Apache Drill > Issue Type: Sub-task > Components: Server >Reporter: Paul Rogers > Fix For: 2.0.0 > > > Today Drill uses ZooKeeper (ZK) to advertise the existence of a Drill-bit, > providing the host name/IP Address of the Drill-bit and the ports used, > encoded in Protobuf format. All other information (status, CPUs, memory) are > assumed to be the same across all Drill-bits in the cluster as specified in > the Drill config file. (Amended to reflect 1.6 behavior.) > Moving forward, as Drill becomes more sophisticated, Drill should advertise > the specifics of each Drill-bit so that one Drill bit can differ from another. > For example, when running on YARN, we need a way for Drill to gracefully shut > down. Advertising a status of Ready or Unavailable will help. Ready is the > normal state. Unavailable means the Drill-bit will finish in-flight queries, > but won't accept new ones. (The actual status is a separate enhancement.) > In a YARN cluster, Drill should take advantage of machines with more memory, > but live with machines with less. (Perhaps some are newer, some are older or > more heavily loaded.) Drill should use ZK to identify its available memory > and CPUs so that the planner can use them. (Use of the info is a separate > enhancement.) > There may be times when two drill bits run on a single machine. If so, they > must use separate ports. So, each Drill-bit should advertise its ports in ZK. > For backward compatibility, the information is optional; if not present, the > receiver should assume the information defaults to that in the config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4581) Various problems in the Drill startup scripts
[ https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-4581: --- Summary: Various problems in the Drill startup scripts (was: Various inconsistencies in the Drill startup scripts) > Various problems in the Drill startup scripts > - > > Key: DRILL-4581 > URL: https://issues.apache.org/jira/browse/DRILL-4581 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.6.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > Noticed the following in drillbit.sh: > 1) Comment: DRILL_LOG_DIRWhere log files are stored. PWD by default. > Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log > 2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default. > Code: DRILL_PID_DIR=$DRILL_HOME > 3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which > checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is > both unnecessary and prints a less informative message than the > drill-config.sh check. Suggestion: Remove the JAVA_HOME check in drillbit.sh. > 4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the > JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? > Recommended: export JAVA_HOME from drill-config.sh. > 5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and > set the default value. Drill-config.sh defaults to /var/log/drill, or if that > fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not > handle the case where that directory is not writable. Suggested: remove the > check in drillbit.sh. > 6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching > sqlline.log, but does not delete that file, leaving a bogus, empty client log > file on the drillbit server. Recommendation: use bash commands instead. > 7) The implementation of the above check is a bit awkward. It has a fallback > case with somewhat awkward logic. Clean this up. > 8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if > it does not exist. Recommended: decide on a single choice, implement it in > drill-config.sh. > 9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults > it to $DRILL_HOME/conf. This can lead to subtle errors. If I use > drillbit.sh --config /misspelled/path > where I mistype the path, I won't get an error, I get the default config, > which may not at all be what I want to run. Recommendation: if the value of > DRILL_CONF_DRILL is passed into the script (as a variable or via --config), > then that directory must exist. Else, use the default. > 10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left > over from the original Hadoop script that the Drill script was based upon. > Recomendation: export only in the case that HADOOP_HOME is set for cygwin. > 11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to > stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a > different message (to stdout) if the version is wrong. Recommendation: use > the same format (and stderr) for both. > 12) Similarly, other Java checks later in the script produce messages to > stdout, not stderr. > 13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies > that it is executable. The script then throws away what we just found. Then, > drill-bit.sh tries to recreate this information as: > JAVA=$JAVA_HOME/bin/java > This is wrong in two ways: 1) it ignores the actual java location and assumes > it, and 2) it does not handle the java.exe case that drill-config.sh > carefully worked out. > Recommendation: export JAVA from drill-config.sh and remove the above line > from drillbit.sh. > 14) drillbit.sh presumably takes extra arguments like this: > drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 > -Dvar2=value2 -Dvar3=value3 > The -D bit allows the user to override config variables at the command line. > But, the scripts don't use the values. > A) drill-config.sh consumes --config /my/conf/dir after consuming the leading > arguments: > while [ $# -gt 1 ]; do > if [ "--config" = "$1" ]; then > shift > confdir=$1 > shift > DRILL_CONF_DIR=$confdir > else > # Presume we are at end of options and break > break > fi > done > B) drill-bit.sh will discard the var1: > startStopStatus=$1 <-- grabs "start" > shift > command=drillbit > shift <-- Consumes -Dvar1=value1 > C) Remaining values passed back into drillbit.sh: > args=$@ > nohup $thiscmd internal_start $command $args > D) Second invocation discards -Dvar2=value2 as described above. > E) Remaining values are passed to
[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition
[ https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227790#comment-15227790 ] Deneche A. Hakim commented on DRILL-4530: - I did an experiment where I *hacked* Drill to use protobuf instead of json for the metadata cache and for a customer case with a parquet table with 3 levels of directories and 395250 files, the protobuf cache was 87% smaller than json and loaded 83% faster. > Improve metadata cache performance for queries with single partition > - > > Key: DRILL-4530 > URL: https://issues.apache.org/jira/browse/DRILL-4530 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.7.0 > > > Consider two types of queries which are run with Parquet metadata caching: > {noformat} > query 1: > SELECT col FROM `A/B/C`; > query 2: > SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C'; > {noformat} > For a certain dataset, the query1 elapsed time is 1 sec whereas query2 > elapsed time is 9 sec even though both are accessing the same amount of data. > The user expectation is that they should perform roughly the same. The main > difference comes from reading the bigger metadata cache file at the root > level 'A' for query2 and then applying the partitioning filter. query1 reads > a much smaller metadata cache file at the subdirectory level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4530) Improve metadata cache performance for queries with single partition
[ https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227790#comment-15227790 ] Deneche A. Hakim edited comment on DRILL-4530 at 4/6/16 6:07 AM: - I did an experiment where I *hacked* Drill to use protobuf instead of json for the metadata cache. For a customer case with a parquet table with 3 levels of directories and 395250 files, the protobuf cache was 87% smaller than json and loaded 83% faster. was (Author: adeneche): I did an experiment where I *hacked* Drill to use protobuf instead of json for the metadata cache and for a customer case with a parquet table with 3 levels of directories and 395250 files, the protobuf cache was 87% smaller than json and loaded 83% faster. > Improve metadata cache performance for queries with single partition > - > > Key: DRILL-4530 > URL: https://issues.apache.org/jira/browse/DRILL-4530 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.7.0 > > > Consider two types of queries which are run with Parquet metadata caching: > {noformat} > query 1: > SELECT col FROM `A/B/C`; > query 2: > SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C'; > {noformat} > For a certain dataset, the query1 elapsed time is 1 sec whereas query2 > elapsed time is 9 sec even though both are accessing the same amount of data. > The user expectation is that they should perform roughly the same. The main > difference comes from reading the bigger metadata cache file at the root > level 'A' for query2 and then applying the partitioning filter. query1 reads > a much smaller metadata cache file at the subdirectory level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)