[jira] [Commented] (DRILL-6033) Using Drill Hive connection to query an Hbase table
[ https://issues.apache.org/jira/browse/DRILL-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292022#comment-16292022 ] Kunal Khatua commented on DRILL-6033: - Is the 'key' the row_key in HBase? What are the Hive and HBase versions? > Using Drill Hive connection to query an Hbase table > --- > > Key: DRILL-6033 > URL: https://issues.apache.org/jira/browse/DRILL-6033 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0 > Environment: 3 instances of Cloudera 5.10v , each one have a drillbit > installed. Each machine has 24 vCPU. >Reporter: Dor > Labels: drill, hbase, hive > Fix For: Future > > > Using Drill hive connection to query Hbase table. > +*Following query *+ > select * from hive.mytable where key >= > '0001:10:2017:0410:3157781' > and key < '0001:10:2017:0410:3157782'; > +*What happened*+ > Failed with an error after timeout. > It seems that the word 'key' didn't push down to hive from drill. > +*What we also tried*+ > Same query in Drill over hbase takes less than a sec, > In hue hive it takes few seconds > +*Debug trail*+ > When you look in the sql profile of drill (using the web), you see a > table full scan for millions of records, while actually it was supposed to > return > 9 rows. > Does Drill on top of hive is using the key to access only the relevant > region of the table? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-6004) Direct buffer bounds checking should be disabled by default
[ https://issues.apache.org/jira/browse/DRILL-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6004: - Reviewer: Paul Rogers (was: Paul Rogers) > Direct buffer bounds checking should be disabled by default > --- > > Key: DRILL-6004 > URL: https://issues.apache.org/jira/browse/DRILL-6004 > Project: Apache Drill > Issue Type: Improvement >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > > Direct buffer bounds checking is enabled either when assertions are enabled > (see DRILL-6001) or when {{drill.enable_unsafe_memory_access}} property is > not set to true, so it is enabled in production as by default > {{drill.enable_unsafe_memory_access}} is not set. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-6002) Avoid memory copy from direct buffer to heap while spilling to local disk
[ https://issues.apache.org/jira/browse/DRILL-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6002: - Reviewer: Paul Rogers (was: Paul Rogers) > Avoid memory copy from direct buffer to heap while spilling to local disk > - > > Key: DRILL-6002 > URL: https://issues.apache.org/jira/browse/DRILL-6002 > Project: Apache Drill > Issue Type: Improvement >Reporter: Vlad Rozov >Assignee: Vlad Rozov > > When spilling to a local disk or to any file system that supports > WritableByteChannel it is preferable to avoid copy from off-heap to java heap > as WritableByteChannel can work directly with the off-heap memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-6028) Allow splitting generated code in ChainedHashTable into blocks to avoid "code too large" error
[ https://issues.apache.org/jira/browse/DRILL-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6028: - Reviewer: Paul Rogers (was: Paul Rogers) > Allow splitting generated code in ChainedHashTable into blocks to avoid "code > too large" error > -- > > Key: DRILL-6028 > URL: https://issues.apache.org/jira/browse/DRILL-6028 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: 1.13.0 > > > Allow splitting generated code in ChainedHashTable into blocks to avoid "code > too large" error. > *REPRODUCE* > File {{1200_columns.csv}} > {noformat} > 0,1,2,3...1200 > 0,1,2,3...1200 > {noformat} > Query > {noformat} > select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv` > union > select columns[0], column[1]...columns[1200] from dfs.`1200_columns.csv` > {noformat} > Error > {noformat} > Error: SYSTEM ERROR: CompileException: File > 'org.apache.drill.exec.compile.DrillJavaFileObject[HashTableGen10.java]', > Line -7886, Column 24: HashTableGen10.java:57650: error: code too large > public boolean isKeyMatchInternalBuild(int incomingRowIdx, int > htRowIdx) >^ (compiler.err.limit.code) > {noformat} > *ROOT CAUSE* > DRILL-4715 added ability to ensure that methods size won't go beyond the 64k > limit imposed by JVM. {{BlkCreateMode.TRUE_IF_BOUND}} was added to create new > block only if # of expressions added hit upper-bound defined by > {{exec.java.compiler.exp_in_method_size}}. Once number of expressions in > methods hits upper bound we create from call inner method. > Example: > {noformat} > public void doSetup(RecordBatch incomingBuild, RecordBatch incomingProbe) > throws SchemaChangeException { > // some logic > return doSetup0(incomingBuild, incomingProbe); > } > {noformat} > During code generation {{ChainedHashTable}} added all code in its methods in > one block (using {{BlkCreateMode.FALSE}}) since {{getHashBuild}} and > {{getHashProbe}} methods contained state and thus could not be split. In > these methods hash was generated for each key expression. For the first key > seed was 0, subsequent keys hash was generated based on seed from previous > key. > To allow splitting for there methods the following was done: > 1. Method signatures was changed: added new parameter {{seedValue}}. > Initially starting seed value was hard-coded during code generation (set to > 0), now it is passed as method parameter. > 2. Initially hash function call for all keys was transformed into one logical > expression which did not allow splitting. Now we create logical expression > for each key and thus splitting is possible. New {{seedValue}} parameter is > used as seed holder to pass seed value for the next key. > 3. {{ParameterExpression}} was added to generate reference to method > parameter during code generation. > Code example: > {noformat} > public int getHashBuild(int incomingRowIdx, int seedValue) > throws SchemaChangeException > { > { > NullableVarCharHolder out3 = new NullableVarCharHolder(); > { > out3 .isSet = vv0 .getAccessor().isSet((incomingRowIdx)); > if (out3 .isSet == 1) { > out3 .buffer = vv0 .getBuffer(); > long startEnd = vv0 > .getAccessor().getStartEnd((incomingRowIdx)); > out3 .start = ((int) startEnd); > out3 .end = ((int)(startEnd >> 32)); > } > } > IntHolder seedValue4 = new IntHolder(); > seedValue4 .value = seedValue; > // start of eval portion of hash32 function. // > IntHolder out5 = new IntHolder(); > { > final IntHolder out = new IntHolder(); > NullableVarCharHolder in = out3; > IntHolder seed = seedValue4; > > Hash32FunctionsWithSeed$NullableVarCharHash_eval: { > if (in.isSet == 0) { > out.value = seed.value; > } else > { > out.value = > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32(in.start, in.end, > in.buffer, seed.value); > } > } > > out5 = out; > } > // end of eval portion of hash32 function. // > seedValue = out5 .value; >return getHashBuild0((incomingRowIdx), (seedValue)); > } > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5967) Memory leak by HashPartitionSender
[ https://issues.apache.org/jira/browse/DRILL-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-5967: - Remaining Estimate: (was: 168h) Original Estimate: (was: 168h) > Memory leak by HashPartitionSender > -- > > Key: DRILL-5967 > URL: https://issues.apache.org/jira/browse/DRILL-5967 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > The error found by [~cch...@maprtech.com] and [~dechanggu] > {code} > 2017-10-25 15:43:28,658 [260eec84-7de3-03ec-300f-7fdbc111fb7c:frag:2:9] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: > Memory was leaked by query. Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > Fragment 2:9 > [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Memory was leaked by query. Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > Fragment 2:9 > [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) > ~[drill-common-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:301) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.11.0-mapr.jar:1.11.0-mapr] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > Caused by: java.lang.IllegalStateException: Memory was leaked by query. > Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5967) Memory leak by HashPartitionSender
[ https://issues.apache.org/jira/browse/DRILL-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Farkas updated DRILL-5967: -- Reviewer: Paul Rogers > Memory leak by HashPartitionSender > -- > > Key: DRILL-5967 > URL: https://issues.apache.org/jira/browse/DRILL-5967 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas > Original Estimate: 168h > Remaining Estimate: 168h > > The error found by [~cch...@maprtech.com] and [~dechanggu] > {code} > 2017-10-25 15:43:28,658 [260eec84-7de3-03ec-300f-7fdbc111fb7c:frag:2:9] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: > Memory was leaked by query. Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > Fragment 2:9 > [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Memory was leaked by query. Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > Fragment 2:9 > [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) > ~[drill-common-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:301) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.11.0-mapr.jar:1.11.0-mapr] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > Caused by: java.lang.IllegalStateException: Memory was leaked by query. > Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5967) Memory leak by HashPartitionSender
[ https://issues.apache.org/jira/browse/DRILL-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291354#comment-16291354 ] ASF GitHub Bot commented on DRILL-5967: --- GitHub user ilooner opened a pull request: https://github.com/apache/drill/pull/1073 DRILL-5967: Fixed memory leak in OrderedPartitionSender The OrderedPartitionSender was leaking memory every time it was created because it created a wrapper RecordBatch which allocated memory but was never closed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilooner/drill DRILL-5967 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1073.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1073 commit 13cd2fe7e8dccb2ac7546de508fbfbed8e19b48b Author: Timothy FarkasDate: 2017-12-14T18:48:27Z DRILL-5967: Fixed memory leak in OrderedPartitionSender > Memory leak by HashPartitionSender > -- > > Key: DRILL-5967 > URL: https://issues.apache.org/jira/browse/DRILL-5967 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas > Original Estimate: 168h > Remaining Estimate: 168h > > The error found by [~cch...@maprtech.com] and [~dechanggu] > {code} > 2017-10-25 15:43:28,658 [260eec84-7de3-03ec-300f-7fdbc111fb7c:frag:2:9] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: > Memory was leaked by query. Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > Fragment 2:9 > [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Memory was leaked by query. Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > Fragment 2:9 > [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) > ~[drill-common-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:301) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.11.0-mapr.jar:1.11.0-mapr] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > Caused by: java.lang.IllegalStateException: Memory was leaked by query. > Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5337) OpenTSDB storage plugin
[ https://issues.apache.org/jira/browse/DRILL-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290547#comment-16290547 ] Arina Ielchiieva commented on DRILL-5337: - [~bbevens] I have updated Jira description with actual links on how plugin works: a. how to write queries b. required configuration example This plugin is configured the same way as others in Drill Web UI. > OpenTSDB storage plugin > --- > > Key: DRILL-5337 > URL: https://issues.apache.org/jira/browse/DRILL-5337 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Reporter: Dmitriy Gavrilovych >Assignee: Dmitriy Gavrilovych > Labels: doc-impacting, ready-to-commit > Fix For: 1.12.0 > > > Storage plugin for OpenTSDB > The plugin uses REST API to work with TSDB. > Query information: > https://github.com/apache/drill/blob/master/contrib/storage-opentsdb/README.md > Configuration example: > https://github.com/apache/drill/blob/master/contrib/storage-opentsdb/src/main/resources/bootstrap-storage-plugins.json -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5337) OpenTSDB storage plugin
[ https://issues.apache.org/jira/browse/DRILL-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5337: Description: Storage plugin for OpenTSDB The plugin uses REST API to work with TSDB. Query information: https://github.com/apache/drill/blob/master/contrib/storage-opentsdb/README.md Configuration example: https://github.com/apache/drill/blob/master/contrib/storage-opentsdb/src/main/resources/bootstrap-storage-plugins.json was: Storage plugin for OpenTSDB The plugin uses REST API to work with TSDB. Expected queries are listed below: SELECT * FROM openTSDB.`warp.speed.test`; Return all elements from warp.speed.test table with default aggregator SUM SELECT * FROM openTSDB.`(metric=warp.speed.test)`; Return all elements from (metric=warp.speed.test) table as a previous query, but with alternative FROM syntax SELECT * FROM openTSDB.`(metric=warp.speed.test, aggregator=avg)`; Return all elements from warp.speed.test table, but with the custom aggregator SELECT `timestamp`, sum(`aggregated value`) FROM openTSDB.`(metric=warp.speed.test, aggregator=avg)` GROUP BY `timestamp`; Return aggregated and grouped value by standard drill functions from warp.speed.test table, but with the custom aggregator SELECT * FROM openTSDB.`(metric=warp.speed.test, downsample=5m-avg)` Return data limited by downsample > OpenTSDB storage plugin > --- > > Key: DRILL-5337 > URL: https://issues.apache.org/jira/browse/DRILL-5337 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Reporter: Dmitriy Gavrilovych >Assignee: Dmitriy Gavrilovych > Labels: doc-impacting, ready-to-commit > Fix For: 1.12.0 > > > Storage plugin for OpenTSDB > The plugin uses REST API to work with TSDB. > Query information: > https://github.com/apache/drill/blob/master/contrib/storage-opentsdb/README.md > Configuration example: > https://github.com/apache/drill/blob/master/contrib/storage-opentsdb/src/main/resources/bootstrap-storage-plugins.json -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-6033) Using Drill Hive connection to query an Hbase table
Dor created DRILL-6033: -- Summary: Using Drill Hive connection to query an Hbase table Key: DRILL-6033 URL: https://issues.apache.org/jira/browse/DRILL-6033 Project: Apache Drill Issue Type: Bug Affects Versions: 1.11.0 Environment: 3 instances of Cloudera 5.10v , each one have a drillbit installed. Each machine has 24 vCPU. Reporter: Dor Using Drill hive connection to query Hbase table. +*Following query *+ select * from hive.mytable where key >= '0001:10:2017:0410:3157781' and key < '0001:10:2017:0410:3157782'; +*What happened*+ Failed with an error after timeout. It seems that the word 'key' didn't push down to hive from drill. +*What we also tried*+ Same query in Drill over hbase takes less than a sec, In hue hive it takes few seconds +*Debug trail*+ When you look in the sql profile of drill (using the web), you see a table full scan for millions of records, while actually it was supposed to return 9 rows. Does Drill on top of hive is using the key to access only the relevant region of the table? -- This message was sent by Atlassian JIRA (v6.4.14#64029)