[jira] [Created] (DRILL-5984) Support for Symlinked Table Paths to be used in Drill Queries.
Saravanabavagugan Vengadasundaram created DRILL-5984: Summary: Support for Symlinked Table Paths to be used in Drill Queries. Key: DRILL-5984 URL: https://issues.apache.org/jira/browse/DRILL-5984 Project: Apache Drill Issue Type: New Feature Affects Versions: 1.11.0 Environment: OS : CentOS 7.1 MapR-DB Version: 5.2.2 Reporter: Saravanabavagugan Vengadasundaram MapR-FS supports symlinks and hence MapR-DB table paths support symlinks as well. As part of the project I work on, we use symlinks as a means of communication to talk to the physical file. An employee table in MapR-DB will be represented as "/tables/Employee/Entity_1233232" and there will be a symlink called "/tables/Employee/Entity" pointing to the actual physical table. Currently, drill does not understand queries having the symlink path but only executes queries having the actual physical table path. So every time, I need to find out the actual physical path of the table and frame my query. It would be nice to have this feature in next version of Drill. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5967) Memory leak by HashPartitionSender
[ https://issues.apache.org/jira/browse/DRILL-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261775#comment-16261775 ] Timothy Farkas commented on DRILL-5967: --- [~cch...@maprtech.com] I have a possible fix here https://github.com/ilooner/drill/tree/DRILL-5967 . Could you please rerun the workload that produced the original error with this branch? [~weijie] Fee free to take a look as well, note I'm not sure this fix will resolve the issue until Chun tests it. > Memory leak by HashPartitionSender > -- > > Key: DRILL-5967 > URL: https://issues.apache.org/jira/browse/DRILL-5967 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas > Original Estimate: 168h > Remaining Estimate: 168h > > The error found by [~cch...@maprtech.com] and [~dechanggu] > {code} > 2017-10-25 15:43:28,658 [260eec84-7de3-03ec-300f-7fdbc111fb7c:frag:2:9] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: > Memory was leaked by query. Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > Fragment 2:9 > [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Memory was leaked by query. Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > Fragment 2:9 > [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) > ~[drill-common-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:301) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267) > [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.11.0-mapr.jar:1.11.0-mapr] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_121] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > Caused by: java.lang.IllegalStateException: Memory was leaked by query. > Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 > (res/actual/peak/limit) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (DRILL-5975) Resource utilization
[ https://issues.apache.org/jira/browse/DRILL-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261772#comment-16261772 ] weijie.tong edited comment on DRILL-5975 at 11/22/17 12:44 AM: --- Well, I could not understand how to schedule MinorFragments as a Task of YARN. Anyway, we can't let our scheduler to depend on Yarn, some companies like Alibaba have their own scheduler systems. Says, the application has a scheduler to schedule their application-level tasks, the tasks themselves also need to be scheduled as Tasks of Linux threads. (some digression, we now depend on ZK too much to complete some works ,once ZK died, the system also died). Maybe I have not described the design clearly. Your first difference does not exist. The task dependency problem already solved by Drill now. I will not changed that logic. Every Foreman will have its own FragmentScheduler. The FragmentScheduler holds the PlanFragments (different from prior implementation, the PlanFragment will not have a explicit hosts initially, the hosts will be assigned by the scheduler at runtime.). The `FragmentScheduler` do two things: * schedule the leaf fragments (i.e. the Scan nodes, to your examples, A,B) to run actively. * accept the first RecordBatch ready event passively, schedule the next MajorFragment's MinorFragments (to your example, the C )to run. The running process is the same as before, only one difference: the `receiver` MinorFragments need to let the `sender` know its destination hosts (through the graph's number 4 step). Other things left will behave the same as before, the receiver of `upstream` MinorFragment (i.e. C) will decide whether to begin the probe side after all the build side's data have arrived. Your second question about the system property. I can not answer well. The other changing beside the scheduler is the RecordBatchManager. It just acts as a buffer stage between the sender MinorFragments and receiver MinorFragments. This design stays in most of the system. Flink is what you already known. Spark is the BlockManager, which is master-slave mode ,not easy to applied to Drill. Both systems support the stream and batch models. While Drill now will not let the data stay in disk at the exchange stage, it let the batch stay in memory and flow through the network by a throttling strategy. This design seems to well at low response time and low concurrent queries(i.e interactive query).The bad thing is resource wasteful , a whole run time tree (someone called it memory-pipeline ) which is too fat to schedule (this maybe one reason we have a infinite thread pool). Now the new design seems to lost some response time performance, but I think it‘s a tradeoff. If the block nodes (sort ,join ,aggregate) are slow and the scan nodes are quick, the `memory-pipeline` model's advantage is not represented. Without a scheduler, Drill is not sufficient to run long-running jobs, as it lacks fault-tolerance . We can make up this leverage this design step by step. I think constrained resource is complement with the scheduler each other. was (Author: weijie): Well, I could not understand how to schedule MinorFragments as a Task of YARN. Anyway, we can't let our scheduler to depend on Yarn, some companies like Alibaba have their own scheduler systems. Says, the application has a scheduler to schedule their application-level tasks, the tasks themselves also need to be scheduled as Tasks of Linux threads. (some digression, we now depend on ZK too much to complete some works ,once ZK died, the system also died). Maybe I have not describe the design clearly. Your first difference does not exist. The task dependency problem already solved by Drill now. I will not changed that logic. Every Foreman will have its own FragmentScheduler. The FragmentScheduler holds the PlanFragments (different from prior implementation, the PlanFragment will not have a explicit hosts initially, the hosts will be assigned by the scheduler at runtime.). The `FragmentScheduler` do two things: * schedule the leaf fragments (i.e. the Scan nodes, to your examples, A,B) to run actively. * accept the first RecordBatch ready event passively, schedule the next MajorFragment's MinorFragments (to your example, the C )to run. The running process is the same as before, only one difference: the `receiver` MinorFragments need to let the `sender` know its destination hosts (through the graph's number 4 step). Other things left will behave the same as before, the receiver of `upstream` MinorFragment (i.e. C) will decide whether to begin the probe side after all the build side's data have arrived. Your second question about the system property. I can not answer well. The other changing beside the scheduler is the RecordBatchManager. It just acts as a buffer stage between the sender MinorFragments and receiver
[jira] [Commented] (DRILL-5975) Resource utilization
[ https://issues.apache.org/jira/browse/DRILL-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261772#comment-16261772 ] weijie.tong commented on DRILL-5975: Well, I could not understand how to schedule MinorFragments as a Task of YARN. Anyway, we can't let our scheduler to depend on Yarn, some companies like Alibaba have their own scheduler systems. Says, the application has a scheduler to schedule their application-level tasks, the tasks themselves also need to be scheduled as Tasks of Linux threads. (some digression, we now depend on ZK too much to complete some works ,once ZK died, the system also died). Maybe I have not describe the design clearly. Your first difference does not exist. The task dependency problem already solved by Drill now. I will not changed that logic. Every Foreman will have its own FragmentScheduler. The FragmentScheduler holds the PlanFragments (different from prior implementation, the PlanFragment will not have a explicit hosts initially, the hosts will be assigned by the scheduler at runtime.). The `FragmentScheduler` do two things: * schedule the leaf fragments (i.e. the Scan nodes, to your examples, A,B) to run actively. * accept the first RecordBatch ready event passively, schedule the next MajorFragment's MinorFragments (to your example, the C )to run. The running process is the same as before, only one difference: the `receiver` MinorFragments need to let the `sender` know its destination hosts (through the graph's number 4 step). Other things left will behave the same as before, the receiver of `upstream` MinorFragment (i.e. C) will decide whether to begin the probe side after all the build side's data have arrived. Your second question about the system property. I can not answer well. The other changing beside the scheduler is the RecordBatchManager. It just acts as a buffer stage between the sender MinorFragments and receiver MinorFragments. This design stays in most of the system. Flink is what you already known. Spark is the BlockManager, which is master-slave mode ,not easy to applied to Drill. Both systems support the stream and batch models. While Drill now will not let the data stay in disk at the exchange stage, it let the batch stay in memory and flow through the network by a throttling strategy. This design seems to well at low response time and low concurrent queries(i.e interactive query).The bad thing is resource wasteful , a whole run time tree (someone called it memory-pipeline ) which is too fat to schedule (this maybe one reason we have a infinite thread pool). Now the new design seems to lost some response time performance, but I think it‘s a tradeoff. If the block nodes (sort ,join ,aggregate) are slow and the scan nodes are quick, the `memory-pipeline` model's advantage is not represented. Without a scheduler, Drill is not sufficient to run long-running jobs, as it lacks fault-tolerance . We can make up this leverage this design step by step. I think constrained resource is complement with the scheduler each other. > Resource utilization > > > Key: DRILL-5975 > URL: https://issues.apache.org/jira/browse/DRILL-5975 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 2.0.0 >Reporter: weijie.tong >Assignee: weijie.tong > > h1. Motivation > Now the resource utilization radio of Drill's cluster is not too good. Most > of the cluster resource is wasted. We can not afford too much concurrent > queries. Once the system accepted more queries with a not high cpu load, the > query which originally is very quick will become slower and slower. > The reason is Drill does not supply a scheduler . It just assume all the > nodes have enough calculation resource. Once a query comes, it will schedule > the related fragments to random nodes not caring about the node's load. Some > nodes will suffer more cpu context switch to satisfy the coming query. The > profound causes to this is that the runtime minor fragments construct a > runtime tree whose nodes spread different drillbits. The runtime tree is a > memory pipeline that is all the nodes will stay alone the whole lifecycle of > a query by sending out data to upper nodes successively, even though some > node could run quickly and quit immediately.What's more the runtime tree is > constructed before actual running. The schedule target to Drill will become > the whole runtime tree nodes. > h1. Design > It will be hard to schedule the runtime tree nodes as a whole. So I try to > solve this by breaking the runtime cascade nodes. The graph below describes > the initial design. > !https://raw.githubusercontent.com/wiki/weijietong/drill/images/design.png! > [graph > link|https://raw.githubusercontent.com/wiki/weijietong/drill/images/design.png] > Every Drillbit
[jira] [Updated] (DRILL-5961) For long running queries (> 10 min) Drill may raise FragmentSetupException for completed/cancelled fragments
[ https://issues.apache.org/jira/browse/DRILL-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-5961: - Reviewer: Parth Chandra > For long running queries (> 10 min) Drill may raise FragmentSetupException > for completed/cancelled fragments > > > Key: DRILL-5961 > URL: https://issues.apache.org/jira/browse/DRILL-5961 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov > > {{WorkEventBus}} uses {{recentlyFinishedFragments}} cache to check for > completed or cancelled fragments. Such check is not reliable as entries in > {{recentlyFinishedFragments}} expire after 10 minutes, so > {{FragmentSetupException}} is raised even for completed or cancelled queries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5981) Add Syntax Highlighting and Error Checking to Storage Plugin Config Page
[ https://issues.apache.org/jira/browse/DRILL-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261667#comment-16261667 ] ASF GitHub Bot commented on DRILL-5981: --- Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/1043 Oh.. just for a heads up (I guess you already knew this), I zeroed in on the 2 themes based on copy-pasting a sample storage plugin into this: https://ace.c9.io/build/kitchen-sink.html > Add Syntax Highlighting and Error Checking to Storage Plugin Config Page > > > Key: DRILL-5981 > URL: https://issues.apache.org/jira/browse/DRILL-5981 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.12.0 >Reporter: Charles Givre > Labels: easyfix > > When configuring storage plugins, it is easy to make a trivial mistake such > as missing a comma or paren, and then spend a great deal of time trying to > find that. This PR adds syntax highlighting and error checking to the > storage plugin page to prevent that. > Note, I work on a closed network and I have included the bare minimum of > javascript libraries needed for this task. I did include them directly in > the PR because I will not be able to build Drill if I have to download them > directly during the build process. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5981) Add Syntax Highlighting and Error Checking to Storage Plugin Config Page
[ https://issues.apache.org/jira/browse/DRILL-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261661#comment-16261661 ] ASF GitHub Bot commented on DRILL-5981: --- Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/1043 Very nice, @cgivre ! I've been looking to do something similar for the SQL side as well, but with autocomplete support. Let's talk about that separately. We just need to make sure that there are no license/usage issues. Would be nice if we can leverage this into the SQL Editor as well. Let's talk over this offline. I know I had a discussion recently with someone regarding validation of the JSON for the storage plugin, but that will be a stretch. Also, it seems like the library can recognize single line comments ( // ), which (I believe) is supported by Drill JSON config parser. Can you pick a theme that helps visibly pop out the colors more than it currently is? Crimson or Eclipse themes look better, helping visualize. Also, if the 'src-min-noconflict' is primarily to support the ace libraries and you don't want to risk renaming the library files, it's good to give it a more meaningful name (indicating that it contains AceJS libraries). Otherwise, LGTM +1 > Add Syntax Highlighting and Error Checking to Storage Plugin Config Page > > > Key: DRILL-5981 > URL: https://issues.apache.org/jira/browse/DRILL-5981 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.12.0 >Reporter: Charles Givre > Labels: easyfix > > When configuring storage plugins, it is easy to make a trivial mistake such > as missing a comma or paren, and then spend a great deal of time trying to > find that. This PR adds syntax highlighting and error checking to the > storage plugin page to prevent that. > Note, I work on a closed network and I have included the bare minimum of > javascript libraries needed for this task. I did include them directly in > the PR because I will not be able to build Drill if I have to download them > directly during the build process. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5730) Fix Unit Test failures on JDK 8 And Some JDK 7 versions
[ https://issues.apache.org/jira/browse/DRILL-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261658#comment-16261658 ] ASF GitHub Bot commented on DRILL-5730: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1045 @paul-rogers > Fix Unit Test failures on JDK 8 And Some JDK 7 versions > --- > > Key: DRILL-5730 > URL: https://issues.apache.org/jira/browse/DRILL-5730 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Tests fail on JDK 8 and oracle JDK 7 on my mac > Failed tests: > TestMetadataProvider.tables:153 expected: but was: > TestMetadataProvider.tablesWithTableNameFilter:212 expected: but > was: > TestMetadataProvider.tablesWithSystemTableFilter:187 expected: but > was: > TestMetadataProvider.tablesWithTableFilter:176 expected: but > was: > Tests in error: > TestInfoSchema.selectFromAllTables » UserRemote SYSTEM ERROR: > URISyntaxExcepti... > TestCustomUserAuthenticator.positiveUserAuth » UserRemote SYSTEM ERROR: > URISyn... > TestCustomUserAuthenticator.positiveUserAuthAfterNegativeUserAuth » > UserRemote > TestViewSupport.infoSchemaWithView:350->BaseTestQuery.testRunAndReturn:344 > » Rpc > TestParquetScan.testSuccessFile:58->BaseTestQuery.testRunAndReturn:344 » > Rpc o... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5730) Fix Unit Test failures on JDK 8 And Some JDK 7 versions
[ https://issues.apache.org/jira/browse/DRILL-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261657#comment-16261657 ] ASF GitHub Bot commented on DRILL-5730: --- GitHub user ilooner opened a pull request: https://github.com/apache/drill/pull/1045 DRILL-5730 Test Mocking Improvements ## DRILL-5730 - Switched to using the interface for FragmentContext everywhere instead of passing around the concrete class. - Minor refactoring of FragmentContext public methods - Switched to using the OptionSet interface throughout the codebase instead of OptionManager - Renamed **FragmentContext** to **FragmentContextImpl** and renamed **FragmentContextInterface** to **FragmentContext**. - Removed JMockit from most unit tests in favor of Mockito. Unfortunately it cannot be removed some some of the unit tests which depend on it for mocking private method and static methods (which is functionality only JMockit provides). In the future we need to refactor the code so that these remaining tests can have JMockit removed completely. - Refactored some tests to use a mock class of FragmentContext - Some tests were using Mockito and JMockit when there was no need for a mocking framework ## Misc - Removed commented out code and unused imports - Removed unnecessary modifiers from methods in interfaces - Fixed a bug in bootstrapcontext which leaked threads - Fixed javadoc links that were broken You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilooner/drill DRILL-5730 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1045.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1045 commit b4b4de83db5df20f2fa56387f5756df0ead3ec17 Author: Paul RogersDate: 2017-10-05T05:43:44Z DRILL-5842: Refactor fragment, operator contexts commit 0a2d938cee7d5d47d3ac0d666ace8163efb3af83 Author: Paul Rogers Date: 2017-10-06T06:24:56Z Fixes for tests which mock contexts commit 34cd7494c68f0934fdf5f455748863be873b3995 Author: Timothy Farkas Date: 2017-10-16T18:28:54Z - Removed commented out code - Removed redundant modifiers on interface methods commit a4944b20abe226a990adc775a3641b44c0b173bb Author: Timothy Farkas Date: 2017-10-16T19:36:23Z - Some more minor code cleanup commit 13f35109a30f03414223c84f4f4fb664ab344e6e Author: Timothy Farkas Date: 2017-10-17T19:30:59Z - Deleted commented out code - Removed unused variables - Replaced usage of FragmentContext with FragmentContextInterface - Refactored OptionSet and FragmentContextInterface interfaces commit 629da8ff3bd40b3269747cf54a88754da3266346 Author: Timothy Farkas Date: 2017-10-18T19:37:37Z - More changes to the FragmentContextInterface - Replaced more usages of FragmentContext with FragmentContextInterface - Replaced usages of OptionManager with OptionSet commit 71f9a1c7d2c8b2f60398348d57344c56a68f556c Author: Timothy Farkas Date: 2017-10-18T19:52:01Z - Removed unused import commit b189350a20e3527d8b6c7df82fdb8641a359dad4 Author: Timothy Farkas Date: 2017-10-18T22:21:52Z - Fixed broken unit tests commit 27f88376c7ad5da384570de0a3eafeb16393829d Author: Timothy Farkas Date: 2017-11-07T19:02:43Z - Deleted unused fields commit 5f3e3ce93aba98e2e20abd0a187392d38a78c374 Author: Timothy Farkas Date: 2017-11-09T02:48:44Z - Removed unused variables - Removed use of Jmockit from unit tests - Minor code cleanup commit df4b0c1fed0f2d34292e6e635635cee4c6f2f2af Author: Timothy Farkas Date: 2017-11-09T21:42:00Z - Fixed java-exec build and test errors commit 8113edb320f2ff12e26bd87f82b23fd47a9513cd Author: Timothy Farkas Date: 2017-11-16T00:46:34Z - Fixed broken test - Removed broken TestOptiqPlans - Completed replacing references to FragmentContext with FragmentContextInterface commit b1fee4ff6e5c6dde14239732baae193ea752f21e Author: Timothy Farkas Date: 2017-11-16T00:58:45Z - Moved TestHashJoin off of JMockit commit f94a115eddfbbae8db861e6e97481197c9100f6c Author: Timothy Farkas Date: 2017-11-16T20:52:03Z - Removed more dependencies on JMockit commit d2178e262885f073cbb0d3daab1640a36f505805 Author: Timothy Farkas Date: 2017-11-20T20:28:29Z - Finished migrating most of the
[jira] [Commented] (DRILL-5941) Skip header / footer logic works incorrectly for Hive tables when file has several input splits
[ https://issues.apache.org/jira/browse/DRILL-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261609#comment-16261609 ] ASF GitHub Bot commented on DRILL-5941: --- Github user ppadma commented on the issue: https://github.com/apache/drill/pull/1030 +1. LGTM. > Skip header / footer logic works incorrectly for Hive tables when file has > several input splits > --- > > Key: DRILL-5941 > URL: https://issues.apache.org/jira/browse/DRILL-5941 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.11.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: Future > > > *To reproduce* > 1. Create csv file with two columns (key, value) for 329 rows, where > first row is a header. > The data file has size of should be greater than chunk size of 256 MB. Copy > file to the distributed file system. > 2. Create table in Hive: > {noformat} > CREATE EXTERNAL TABLE `h_table`( > `key` bigint, > `value` string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'maprfs:/tmp/h_table' > TBLPROPERTIES ( > 'skip.header.line.count'='1'); > {noformat} > 3. Execute query {{select * from hive.h_table}} in Drill (query data using > Hive plugin). The result will return less rows then expected. Expected result > is 328 (total count minus one row as header). > *The root cause* > Since file is greater than default chunk size, it's split into several > fragments, known as input splits. For example: > {noformat} > maprfs:/tmp/h_table/h_table.csv:0+268435456 > maprfs:/tmp/h_table/h_table.csv:268435457+492782112 > {noformat} > TextHiveReader is responsible for handling skip header and / or footer logic. > Currently Drill creates reader [for each input > split|https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScanBatchCreator.java#L84] > and skip header and /or footer logic is applied for each input splits, > though ideally the above mentioned input splits should have been read by one > reader, so skip / header footer logic was applied correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5730) Fix Unit Test failures on JDK 8 And Some JDK 7 versions
[ https://issues.apache.org/jira/browse/DRILL-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261582#comment-16261582 ] Timothy Farkas commented on DRILL-5730: --- Got these tests running from eclipse. There are still some tests that depend on JMockit for now since JMockit is used to mock static and private methods. Currently JMockit is the only stable mocking library that can do this, so we will likely have to refactor some code to make it testable without having to do crazy things with JMockit later. > Fix Unit Test failures on JDK 8 And Some JDK 7 versions > --- > > Key: DRILL-5730 > URL: https://issues.apache.org/jira/browse/DRILL-5730 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > Tests fail on JDK 8 and oracle JDK 7 on my mac > Failed tests: > TestMetadataProvider.tables:153 expected: but was: > TestMetadataProvider.tablesWithTableNameFilter:212 expected: but > was: > TestMetadataProvider.tablesWithSystemTableFilter:187 expected: but > was: > TestMetadataProvider.tablesWithTableFilter:176 expected: but > was: > Tests in error: > TestInfoSchema.selectFromAllTables » UserRemote SYSTEM ERROR: > URISyntaxExcepti... > TestCustomUserAuthenticator.positiveUserAuth » UserRemote SYSTEM ERROR: > URISyn... > TestCustomUserAuthenticator.positiveUserAuthAfterNegativeUserAuth » > UserRemote > TestViewSupport.infoSchemaWithView:350->BaseTestQuery.testRunAndReturn:344 > » Rpc > TestParquetScan.testSuccessFile:58->BaseTestQuery.testRunAndReturn:344 » > Rpc o... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261373#comment-16261373 ] ASF GitHub Bot commented on DRILL-5089: --- Github user chunhui-shi closed the pull request at: https://github.com/apache/drill/pull/795 > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.12.0 > > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5657) Implement size-aware result set loader
[ https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261237#comment-16261237 ] ASF GitHub Bot commented on DRILL-5657: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/914 Thanks, Parth! Will make another commit to address small issues that Karthik pointed out. Let's hold off the actual commit until Drill 1.12 ships. I'll then commit the changes when we open things up again for Drill 1.13 changes. > Implement size-aware result set loader > -- > > Key: DRILL-5657 > URL: https://issues.apache.org/jira/browse/DRILL-5657 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: Future >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: Future > > > A recent extension to Drill's set of test tools created a "row set" > abstraction to allow us to create, and verify, record batches with very few > lines of code. Part of this work involved creating a set of "column > accessors" in the vector subsystem. Column readers provide a uniform API to > obtain data from columns (vectors), while column writers provide a uniform > writing interface. > DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size > (to avoid memory fragmentation due to Drill's two memory allocators.) The > column accessors have proven to be so useful that they will be the basis for > the new, size-aware writers used by Drill's record readers. > A step in that direction is to retrofit the column writers to use the > size-aware {{setScalar()}} and {{setArray()}} methods introduced in > DRILL-5517. > Since the test framework row set classes are (at present) the only consumer > of the accessors, those classes must also be updated with the changes. > This then allows us to add a new "row mutator" class that handles size-aware > vector writing, including the case in which a vector fills in the middle of a > row. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4286) Have an ability to put server in quiescent mode of operation
[ https://issues.apache.org/jira/browse/DRILL-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261224#comment-16261224 ] ASF GitHub Bot commented on DRILL-4286: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/921#discussion_r152361270 --- Diff: protocol/src/main/java/org/apache/drill/exec/proto/beans/RpcType.java --- @@ -25,28 +25,8 @@ HANDSHAKE(0), ACK(1), GOODBYE(2), -RUN_QUERY(3), -CANCEL_QUERY(4), -REQUEST_RESULTS(5), -RESUME_PAUSED_QUERY(11), -GET_QUERY_PLAN_FRAGMENTS(12), -GET_CATALOGS(14), -GET_SCHEMAS(15), -GET_TABLES(16), -GET_COLUMNS(17), -CREATE_PREPARED_STATEMENT(22), -GET_SERVER_META(8), -QUERY_DATA(6), -QUERY_HANDLE(7), -QUERY_PLAN_FRAGMENTS(13), -CATALOGS(18), -SCHEMAS(19), -TABLES(20), -COLUMNS(21), -PREPARED_STATEMENT(23), -SERVER_META(9), -QUERY_RESULT(10), -SASL_MESSAGE(24); +REQ_RECORD_BATCH(3), +SASL_MESSAGE(4); --- End diff -- The change seems to be that messages are dropped. That can't be good. The only diff that should show up here is the addition of your new state codes. The other explanation is that master is wrong, which would be a bad state of affairs. > Have an ability to put server in quiescent mode of operation > > > Key: DRILL-4286 > URL: https://issues.apache.org/jira/browse/DRILL-4286 > Project: Apache Drill > Issue Type: New Feature > Components: Execution - Flow >Reporter: Victoria Markman >Assignee: Venkata Jyothsna Donapati > > I think drill will benefit from mode of operation that is called "quiescent" > in some databases. > From IBM Informix server documentation: > {code} > Change gracefully from online to quiescent mode > Take the database server gracefully from online mode to quiescent mode to > restrict access to the database server without interrupting current > processing. After you perform this task, the database server sets a flag that > prevents new sessions from gaining access to the database server. The current > sessions are allowed to finish processing. After you initiate the mode > change, it cannot be canceled. During the mode change from online to > quiescent, the database server is considered to be in Shutdown mode. > {code} > This is different from shutdown, when processes are terminated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5952) Implement "CREATE TABLE IF NOT EXISTS"
[ https://issues.apache.org/jira/browse/DRILL-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261159#comment-16261159 ] ASF GitHub Bot commented on DRILL-5952: --- Github user prasadns14 commented on the issue: https://github.com/apache/drill/pull/1033 @arina-ielchiieva, please review > Implement "CREATE TABLE IF NOT EXISTS" > -- > > Key: DRILL-5952 > URL: https://issues.apache.org/jira/browse/DRILL-5952 > Project: Apache Drill > Issue Type: Improvement > Components: SQL Parser >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya > Fix For: Future > > > Currently, if a table/view with the same name exists CREATE TABLE fails with > VALIDATION ERROR > Having "IF NOT EXISTS" support for CREATE TABLE will ensure that query > succeeds -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260913#comment-16260913 ] ASF GitHub Bot commented on DRILL-4779: --- Github user kameshb commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r152310859 --- Diff: contrib/storage-kafka/src/test/java/org/apache/drill/exec/store/kafka/TestKafkaSuit.java --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.Properties; +import java.util.concurrent.atomic.AtomicInteger; + +import org.I0Itec.zkclient.ZkClient; +import org.I0Itec.zkclient.ZkConnection; +import org.apache.drill.exec.store.kafka.cluster.EmbeddedKafkaCluster; +import org.apache.kafka.common.serialization.StringSerializer; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.runner.RunWith; +import org.junit.runners.Suite; +import org.junit.runners.Suite.SuiteClasses; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Joiner; + +import kafka.admin.AdminUtils; +import kafka.admin.RackAwareMode; +import kafka.utils.ZKStringSerializer$; +import kafka.utils.ZkUtils; + +@RunWith(Suite.class) +@SuiteClasses({ KafkaQueriesTest.class, MessageIteratorTest.class }) +public class TestKafkaSuit { --- End diff -- Thanks for suggesting this frameworks. I think this may take considerable amount of time and since release date is very close, we will address this refactoring by next release. > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260877#comment-16260877 ] ASF GitHub Bot commented on DRILL-4779: --- Github user kameshb commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r152303056 --- Diff: contrib/storage-kafka/src/test/java/org/apache/drill/exec/store/kafka/decoders/MessageReaderFactoryTest.java --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka.decoders; + +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.proto.UserBitShared.DrillPBError.ErrorType; +import org.junit.Assert; +import org.junit.Test; + +public class MessageReaderFactoryTest { + + @Test + public void testShouldThrowExceptionAsMessageReaderIsNull() { --- End diff -- taken care > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5962) Add function STAsJSON to extend GIS support
[ https://issues.apache.org/jira/browse/DRILL-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5962: Labels: doc-impacting ready-to-commit (was: doc-impacting) > Add function STAsJSON to extend GIS support > --- > > Key: DRILL-5962 > URL: https://issues.apache.org/jira/browse/DRILL-5962 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.11.0 >Reporter: Chris Sandison >Assignee: Chris Sandison >Priority: Minor > Labels: doc-impacting, ready-to-commit > Fix For: 1.12.0 > > Original Estimate: 3h > Remaining Estimate: 3h > > Add function as wrapper to ESRI's `asJson` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DRILL-5962) Add function STAsJSON to extend GIS support
[ https://issues.apache.org/jira/browse/DRILL-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-5962: --- Assignee: Chris Sandison (was: Arina Ielchiieva) > Add function STAsJSON to extend GIS support > --- > > Key: DRILL-5962 > URL: https://issues.apache.org/jira/browse/DRILL-5962 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.11.0 >Reporter: Chris Sandison >Assignee: Chris Sandison >Priority: Minor > Labels: doc-impacting, ready-to-commit > Fix For: 1.12.0 > > Original Estimate: 3h > Remaining Estimate: 3h > > Add function as wrapper to ESRI's `asJson` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5962) Add function STAsJSON to extend GIS support
[ https://issues.apache.org/jira/browse/DRILL-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5962: Fix Version/s: 1.12.0 > Add function STAsJSON to extend GIS support > --- > > Key: DRILL-5962 > URL: https://issues.apache.org/jira/browse/DRILL-5962 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.11.0 >Reporter: Chris Sandison >Assignee: Chris Sandison >Priority: Minor > Labels: doc-impacting > Fix For: 1.12.0 > > Original Estimate: 3h > Remaining Estimate: 3h > > Add function as wrapper to ESRI's `asJson` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5962) Add function STAsJSON to extend GIS support
[ https://issues.apache.org/jira/browse/DRILL-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260843#comment-16260843 ] ASF GitHub Bot commented on DRILL-5962: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1036 Please update commit message. +1, LGTM. > Add function STAsJSON to extend GIS support > --- > > Key: DRILL-5962 > URL: https://issues.apache.org/jira/browse/DRILL-5962 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.11.0 >Reporter: Chris Sandison >Assignee: Chris Sandison >Priority: Minor > Labels: doc-impacting > Fix For: 1.12.0 > > Original Estimate: 3h > Remaining Estimate: 3h > > Add function as wrapper to ESRI's `asJson` -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5980) Make queryType param for REST API case insensitive
[ https://issues.apache.org/jira/browse/DRILL-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260810#comment-16260810 ] Chris Sandison commented on DRILL-5980: --- [~arina] I will have a PR for this shortly > Make queryType param for REST API case insensitive > -- > > Key: DRILL-5980 > URL: https://issues.apache.org/jira/browse/DRILL-5980 > Project: Apache Drill > Issue Type: Wish >Reporter: Chris Sandison >Assignee: Chris Sandison >Priority: Minor > Original Estimate: 3h > Remaining Estimate: 3h > > QueryType uses `valueOf` to load appropriate enum, but this fails on > incorrect case for the argument. I have found this to be a breaking change > between 1.10 and 1.11. This could be improve to use `upcase()` to get the > enum. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260780#comment-16260780 ] ASF GitHub Bot commented on DRILL-4779: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r152286833 --- Diff: contrib/storage-kafka/src/test/java/org/apache/drill/exec/store/kafka/decoders/MessageReaderFactoryTest.java --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka.decoders; + +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.proto.UserBitShared.DrillPBError.ErrorType; +import org.junit.Assert; +import org.junit.Test; + +public class MessageReaderFactoryTest { + + @Test + public void testShouldThrowExceptionAsMessageReaderIsNull() { --- End diff -- Please check the below tests, they can give false positive result as well. > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-3993) Rebase Drill on Calcite master branch
[ https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260719#comment-16260719 ] Roman Kulyk commented on DRILL-3993: Fixed all unit tests. Need to check Functional/Advanced tests. > Rebase Drill on Calcite master branch > - > > Key: DRILL-3993 > URL: https://issues.apache.org/jira/browse/DRILL-3993 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.2.0 >Reporter: Sudheesh Katkam >Assignee: Roman Kulyk > > Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure > there are no regressions. > Also, how do we resolve this 'catching up' issue in the long term? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260604#comment-16260604 ] ASF GitHub Bot commented on DRILL-4779: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r152248647 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.Collection; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.server.options.OptionManager; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; + +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private final boolean unionEnabled; + private final KafkaStoragePlugin plugin; + private final KafkaSubScanSpec subScanSpec; + private final long kafkaPollTimeOut; + + private long currentOffset; + private MessageIterator msgItr; + + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + private final String kafkaMsgReader; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +OptionManager options = context.getOptions(); +this.unionEnabled = options.getOption(ExecConstants.ENABLE_UNION_TYPE); +this.kafkaMsgReader = options.getOption(ExecConstants.KAFKA_RECORD_READER).string_val; +this.kafkaPollTimeOut = options.getOption(ExecConstants.KAFKA_POLL_TIMEOUT).num_val; +this.plugin = plugin; +this.subScanSpec = subScanSpec; + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +} +return transformed; + } + + @Override + public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException { +this.writer = new VectorContainerWriter(output, unionEnabled); +
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260603#comment-16260603 ] ASF GitHub Bot commented on DRILL-4779: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r152248126 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/schema/KafkaMessageSchema.java --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka.schema; + +import java.util.Map; +import java.util.Set; + +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.Table; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.planner.logical.DrillTable; +import org.apache.drill.exec.planner.logical.DynamicDrillTable; +import org.apache.drill.exec.store.AbstractSchema; +import org.apache.drill.exec.store.kafka.KafkaScanSpec; +import org.apache.drill.exec.store.kafka.KafkaStoragePlugin; +import org.apache.drill.exec.store.kafka.KafkaStoragePluginConfig; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.KafkaException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Maps; + +public class KafkaMessageSchema extends AbstractSchema { + + private static final Logger logger = LoggerFactory.getLogger(KafkaMessageSchema.class); + private final KafkaStoragePlugin plugin; + private final MapdrillTables = Maps.newHashMap(); + private Set tableNames; + + public KafkaMessageSchema(final KafkaStoragePlugin plugin, final String name) { +super(ImmutableList. of(), name); +this.plugin = plugin; + } + + @Override + public String getTypeName() { +return KafkaStoragePluginConfig.NAME; + } + + void setHolder(SchemaPlus plusOfThis) { +for (String s : getSubSchemaNames()) { + plusOfThis.add(s, getSubSchema(s)); +} + } + + @Override + public Table getTable(String tableName) { +if (!drillTables.containsKey(tableName)) { + KafkaScanSpec scanSpec = new KafkaScanSpec(tableName); + DrillTable table = new DynamicDrillTable(plugin, getName(), scanSpec); + drillTables.put(tableName, table); +} + +return drillTables.get(tableName); + } + + @Override + public Set getTableNames() { +if (tableNames == null) { + try (KafkaConsumer kafkaConsumer = new KafkaConsumer<>(plugin.getConfig().getKafkaConsumerProps())) { +tableNames = kafkaConsumer.listTopics().keySet(); + } catch(KafkaException e) { +logger.error(e.getMessage(), e); +throw UserException.dataReadError(e).message("Failed to get tables information").addContext(e.getMessage()) --- End diff -- If I am not mistaken, `UserException` logs exception internally. Please checks the logs to verify if using `logger` and `UserException` is not duplicating exception. > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260605#comment-16260605 ] ASF GitHub Bot commented on DRILL-4779: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r152246552 --- Diff: contrib/storage-kafka/src/test/java/org/apache/drill/exec/store/kafka/MessageIteratorTest.java --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.NoSuchElementException; +import java.util.Properties; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.proto.UserBitShared.DrillPBError.ErrorType; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.kafka.clients.consumer.ConsumerConfig; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.serialization.ByteArrayDeserializer; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +public class MessageIteratorTest extends KafkaTestBase { + + private KafkaConsumerkafkaConsumer; + private KafkaSubScanSpec subScanSpec; + + @Before + public void setUp() { +Properties consumerProps = storagePluginConfig.getKafkaConsumerProps(); +consumerProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, ByteArrayDeserializer.class); +consumerProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ByteArrayDeserializer.class); +consumerProps.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "4"); +kafkaConsumer = new KafkaConsumer<>(consumerProps); +subScanSpec = new KafkaSubScanSpec(QueryConstants.JSON_TOPIC, 0, 0, TestKafkaSuit.NUM_JSON_MSG); + } + + @After + public void cleanUp() { +if (kafkaConsumer != null) { + kafkaConsumer.close(); +} + } + + @Test + public void testWhenPollTimeOutIsTooLess() { +MessageIterator iterator = new MessageIterator(kafkaConsumer, subScanSpec, 1); +try { + iterator.hasNext(); +} catch (UserException ue) { --- End diff -- Test cane give false positive result if exception won't be thrown at all. Please re-throw the exception in catch block and update test annotation`@Test(expected = UserException.class` > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >Assignee: B Anil Kumar > Labels: doc-impacting > Fix For: 1.12.0 > > > Implement Kafka storage plugin will enable the strong SQL support for Kafka. > Initially implementation can target for supporting json and avro message types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4779) Kafka storage plugin support
[ https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260602#comment-16260602 ] ASF GitHub Bot commented on DRILL-4779: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r152245838 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/MessageIterator.java --- @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.Iterator; +import java.util.List; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; + +import kafka.common.KafkaException; + +public class MessageIterator implements Iterator> { + + private static final Logger logger = LoggerFactory.getLogger(MessageIterator.class); + private final KafkaConsumer kafkaConsumer; + private Iterator > recordIter; + private final TopicPartition topicPartition; + private long totalFetchTime = 0; + private final long kafkaPollTimeOut; + private final long endOffset; + + public MessageIterator(final KafkaConsumer kafkaConsumer, final KafkaSubScanSpec subScanSpec, + final long kafkaPollTimeOut) { +this.kafkaConsumer = kafkaConsumer; +this.kafkaPollTimeOut = kafkaPollTimeOut; + +List partitions = Lists.newArrayListWithCapacity(1); +topicPartition = new TopicPartition(subScanSpec.getTopicName(), subScanSpec.getPartitionId()); +partitions.add(topicPartition); +this.kafkaConsumer.assign(partitions); +logger.info("Start offset of {}:{} is - {}", subScanSpec.getTopicName(), subScanSpec.getPartitionId(), +subScanSpec.getStartOffset()); +this.kafkaConsumer.seek(topicPartition, subScanSpec.getStartOffset()); +this.endOffset = subScanSpec.getEndOffset(); + } + + @Override + public void remove() { +throw new UnsupportedOperationException("Does not support remove operation"); + } + + @Override + public boolean hasNext() { +if (recordIter != null && recordIter.hasNext()) { + return true; +} + +long nextPosition = kafkaConsumer.position(topicPartition); +if (nextPosition >= endOffset) { + return false; +} + +ConsumerRecords consumerRecords = null; +Stopwatch stopwatch = Stopwatch.createStarted(); +try { + consumerRecords = kafkaConsumer.poll(kafkaPollTimeOut); +} catch (KafkaException ke) { + logger.error(ke.getMessage(), ke); + throw UserException.dataReadError(ke).message(ke.getMessage()).build(logger); +} +stopwatch.stop(); + +String errorMsg = new StringBuilder().append("Failed to fetch messages within ").append(kafkaPollTimeOut) --- End diff -- Error message can be constructed inside of if clause: `if (consumerRecords.isEmpty()) {` > Kafka storage plugin support > > > Key: DRILL-4779 > URL: https://issues.apache.org/jira/browse/DRILL-4779 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 1.11.0 >Reporter: B Anil Kumar >
[jira] [Commented] (DRILL-5981) Add Syntax Highlighting and Error Checking to Storage Plugin Config Page
[ https://issues.apache.org/jira/browse/DRILL-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260565#comment-16260565 ] ASF GitHub Bot commented on DRILL-5981: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1043 @cgivre looks really nice, thanks for making the change! @kkhatua could you please take a look at this PR? > Add Syntax Highlighting and Error Checking to Storage Plugin Config Page > > > Key: DRILL-5981 > URL: https://issues.apache.org/jira/browse/DRILL-5981 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.12.0 >Reporter: Charles Givre > Labels: easyfix > > When configuring storage plugins, it is easy to make a trivial mistake such > as missing a comma or paren, and then spend a great deal of time trying to > find that. This PR adds syntax highlighting and error checking to the > storage plugin page to prevent that. > Note, I work on a closed network and I have included the bare minimum of > javascript libraries needed for this task. I did include them directly in > the PR because I will not be able to build Drill if I have to download them > directly during the build process. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5960) Add function STAsGeoJSON to extend GIS support
[ https://issues.apache.org/jira/browse/DRILL-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260556#comment-16260556 ] ASF GitHub Bot commented on DRILL-5960: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1034 +1 > Add function STAsGeoJSON to extend GIS support > -- > > Key: DRILL-5960 > URL: https://issues.apache.org/jira/browse/DRILL-5960 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.11.0 >Reporter: Chris Sandison >Assignee: Chris Sandison >Priority: Minor > Labels: doc-impacting, ready-to-commit > Fix For: 1.12.0 > > Original Estimate: 3h > Remaining Estimate: 3h > > Add function as wrapper to ESRI's `asGeoJson` functionality. > Implementation is very similar to STAsText -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5089: Fix Version/s: 1.12.0 > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.12.0 > > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260553#comment-16260553 ] ASF GitHub Bot commented on DRILL-5089: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1032 +1 > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.12.0 > > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5089) Skip initializing all enabled storage plugins for every query
[ https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5089: Labels: ready-to-commit (was: ) > Skip initializing all enabled storage plugins for every query > - > > Key: DRILL-5089 > URL: https://issues.apache.org/jira/browse/DRILL-5089 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.9.0 >Reporter: Abhishek Girish >Assignee: Chunhui Shi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.12.0 > > > In a query's lifecycle, at attempt is made to initialize each enabled storage > plugin, while building the schema tree. This is done regardless of the actual > plugins involved within a query. > Sometimes, when one or more of the enabled storage plugins have issues - > either due to misconfiguration or the underlying datasource being slow or > being down, the overall query time taken increases drastically. Most likely > due the attempt being made to register schemas from a faulty plugin. > For example, when a jdbc plugin is configured with SQL Server, and at one > point the underlying SQL Server db goes down, any Drill query starting to > execute at that point and beyond begin to slow down drastically. > We must skip registering unrelated schemas (& workspaces) for a query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5983) Unsupported nullable converted type INT_8 for primitive type INT32 error
Hakan Sarıbıyık created DRILL-5983: -- Summary: Unsupported nullable converted type INT_8 for primitive type INT32 error Key: DRILL-5983 URL: https://issues.apache.org/jira/browse/DRILL-5983 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.11.0, 1.10.0 Environment: NAME="Ubuntu" VERSION="16.04.2 LTS (Xenial Xerus)" Reporter: Hakan Sarıbıyık When I query a table with byte in it, then it gives an error; _Query Failed: An Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: ExecutionSetupException: Unsupported nullable converted type INT_8 for primitive type INT32 Fragment 1:6 [Error Id: 46636b05-cff5-455b-ba25-527217346b3e on bigdata7:31010]_ Actualy, it has been solved with [DRILL-4764] - Parquet file with INT_16, etc. logical types not supported by simple SELECT according to https://drill.apache.org/docs/apache-drill-1-10-0-release-notes/ But i tried it with even 1-11-0 it didnt worked. I am querying parquet formatted file with pySpark tablo1 sourceid: byte (nullable = true) select sourceid from tablo1 works as expected with pySpark. But not with Drill v1.11.0 Thanx. -- This message was sent by Atlassian JIRA (v6.4.14#64029)