[jira] [Created] (DRILL-4895) StreamingAggBatch code generation issues
Gautam Kumar Parai created DRILL-4895: - Summary: StreamingAggBatch code generation issues Key: DRILL-4895 URL: https://issues.apache.org/jira/browse/DRILL-4895 Project: Apache Drill Issue Type: Bug Affects Versions: 1.7.0 Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai We unnecessarily re-generate the code for the StreamingAggBatch even without schema changes. Also, we seem to generate many holder variables than what maybe required. This also affects sub-classes. HashAggBatch does not have the same issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request #588: Added test cases
GitHub user gparai opened a pull request: https://github.com/apache/drill/pull/588 Added test cases Added testcases to verify plans and run the same for the group-by and non group-by cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gparai/drill Drill-4771-ADM Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/588.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #588 commit 6dbf9dd8def93b9200f941e79d6a79f8a3551cd3 Author: Gautam ParaiDate: 2016-09-13T03:21:46Z Added test cases --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #587: DRILL-4894: Fix unit test failure in 'storage-hive/core' m...
Github user gparai commented on the issue: https://github.com/apache/drill/pull/587 +1, unit tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/585#discussion_r79267883 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java --- @@ -592,11 +592,14 @@ public BatchGroup mergeAndSpill(LinkedList batchGroups) throws Schem } injector.injectChecked(context.getExecutionControls(), INTERRUPTION_WHILE_SPILLING, IOException.class); newGroup.closeOutputStream(); -} catch (Exception e) { +} catch (Throwable e) { // we only need to cleanup newGroup if spill failed - AutoCloseables.close(e, newGroup); + try { +AutoCloseables.close(e, newGroup); + } catch (Throwable t) { /* close() may hit the same IO issue; just ignore */ } --- End diff -- The root cause for the whole bug is in Hadoop's RawLocalFileSystem.java: package org.apache.hadoop.fs; . public void write(byte[] b, int off, int len) throws IOException { try { fos.write(b, off, len); } catch (IOException e) {// unexpected exception throw new FSError(e); // assume native fs error } } And FSError is not a subclass of IOException !!! java.lang.Object java.lang.Throwable java.lang.Error org.apache.hadoop.fs.FSError So the only common ancestor is Throwable . And any part in the drill code that catches only IOException will not catch !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #587: DRILL-4894: Fix unit test failure in 'storage-hive/...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/587 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #587: DRILL-4894: Fix unit test failure in 'storage-hive/core' m...
Github user adityakishore commented on the issue: https://github.com/apache/drill/pull/587 I have verified that this does not alter the content of binary package. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #587: DRILL-4894: Fix unit test failure in 'storage-hive/core' m...
Github user chunhui-shi commented on the issue: https://github.com/apache/drill/pull/587 +1, unit test passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/585#discussion_r79255636 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java --- @@ -592,11 +592,14 @@ public BatchGroup mergeAndSpill(LinkedList batchGroups) throws Schem } injector.injectChecked(context.getExecutionControls(), INTERRUPTION_WHILE_SPILLING, IOException.class); newGroup.closeOutputStream(); -} catch (Exception e) { +} catch (Throwable e) { // we only need to cleanup newGroup if spill failed - AutoCloseables.close(e, newGroup); + try { +AutoCloseables.close(e, newGroup); + } catch (Throwable t) { /* close() may hit the same IO issue; just ignore */ } --- End diff -- In the case of no disk space to spill, close() tries to cleanup by calling flushBuffer() which eventually throws the same exception as there's still no space: at java.io.FileOutputStream.write(FileOutputStream.java:326) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:246) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) - locked <0x24e5> (a java.io.BufferedOutputStream) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) at java.io.DataOutputStream.write(DataOutputStream.java:107) - locked <0x24e7> (a org.apache.hadoop.fs.FSDataOutputStream) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:419) at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:206) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:163) - locked <0x24e8> (a org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:144) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:407) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:169) at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:53) at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:43) at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:598) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #587: DRILL-4894: Fix unit test failure in 'storage-hive/...
GitHub user adityakishore opened a pull request: https://github.com/apache/drill/pull/587 DRILL-4894: Fix unit test failure in 'storage-hive/core' module Exclude 'hadoop-mapreduce-client-core' and 'hadoop-auth' as transitive dependencies from 'hbase-server' You can merge this pull request into a Git repository by running: $ git pull https://github.com/adityakishore/drill DRILL-4894 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/587.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #587 commit f3c26e34e3a72ef338c4dbca1a0204f342176972 Author: Aditya KishoreDate: 2016-09-16T19:14:35Z DRILL-4894: Fix unit test failure in 'storage-hive/core' module Exclude 'hadoop-mapreduce-client-core' and 'hadoop-auth' as transitive dependencies from 'hbase-server' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: System/session options
Looks like there is no way to get SessionOptionManager in Metadata class from anywhere. The question isn't actual. I will take a look to store the option in ParquetPluginConfig. Thanks, Sudheesh. Kind regards Vitalii 2016-09-16 19:11 GMT+03:00 Sudheesh Katkam: > Can you provide more details about your case? > > DRILL-3363 requests for a nice error message for options that cannot be > set at session level (there is no handle to a UserSession in some cases > e.g. function registry). AFAIK currently, such statements are no ops. > > Thank you, > Sudheesh > > > On Sep 16, 2016, at 8:55 AM, Vitalii Diravka > wrote: > > > > Hi all! > > > > I am going to add one new option and it looks like I can use it only at > the > > system level (Metadata class). > > > > I saw this task https://issues.apache.org/jira/browse/DRILL-3363. > > Does it mean that only system-wide-variables could be used in drill > > (without appropriate session options)? > > > > > > Kind regards > > Vitalii > >
Drill with Proto Buffers or Apache Thrift
Hi, We are evaluating Drill for data with multi-dimensional array. We like to keep the overhead low. So we decided against using flatten() to query the multi-dimensional array. Similarly using the indices to refer to the array elements is simply infeasible as our array is dynamic and we will not know the number of elements present in the array (the array represents the coordinates in a geojson). We are evaluating the potentials for using Proto Buffers to serialize the multi-dimensional array first before querying the data with Drill. So avoiding the error " *Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST"* Pls note that while our query results include these arrays (as in "select *"), we are not querying the array itself with Drill. Rather, we are querying the other attributes associated with in the same object. Hence it is theoretically possible to query while the array remains serialized. Our data is originally in the format of a JSON, hence the complex structure. However, we have some questions on the architectural feasibility without draining the performance of Drill and Proto Buffers. It is no doubt that both are highly performing. However, we are skeptical about the use of them combined. Is there any development effort on serialization with Protocol Buffers and/or Apache Thrift? Any storage plugins developed, or similar deployment architectures,as in: *Data with multi-dimensional array -> Data with the multi-dimensional array serialized with Protocol Buffers -> Query with Drill -> Deserialize the multi-dimensional arrays in the query results back with Protocol Buffers* ? Pls share your thoughts on this (whether you have attempted this, or is there something that I am failing to see). We have also tried other alternatives such as using CTAS and also a potential to just modify the data source schema from multi-dimensional arrays to a map [1]. We do not mind the initial performance hit of conversions. This is just a one-time cost. What matters is the consequent read queries - they should be efficient and fast, as in using Drill when multi-dimensional arrays are not included. [1] http://kkpradeeban.blogspot.com/search/label/Drill Thank you. Regards, Pradeeban. -- Pradeeban Kathiravelu. PhD Researcher, Erasmus Mundus Joint Doctorate in Distributed Computing, INESC-ID Lisboa / Instituto Superior Técnico, Universidade de Lisboa, Portugal. Biomedical Informatics Software Engineer, Emory University School of Medicine. Blog: [Llovizna] http://kkpradeeban.blogspot.com/ LinkedIn: www.linkedin.com/pub/kathiravelu-pradeeban/12/b6a/b03
[jira] [Created] (DRILL-4894) Fix unit test failure in 'storage-hive/core' module
Aditya Kishore created DRILL-4894: - Summary: Fix unit test failure in 'storage-hive/core' module Key: DRILL-4894 URL: https://issues.apache.org/jira/browse/DRILL-4894 Project: Apache Drill Issue Type: Bug Reporter: Aditya Kishore Assignee: Aditya Kishore As part of DRILL-4886, I added `hbase-server` as a dependency for 'storage-hive/core' which pulled older version (2.5.1) of some hadoop jars, incompatible with other hadoop jars used by drill (2.7.1). This breaks unit tests in this module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: System/session options
Can you provide more details about your case? DRILL-3363 requests for a nice error message for options that cannot be set at session level (there is no handle to a UserSession in some cases e.g. function registry). AFAIK currently, such statements are no ops. Thank you, Sudheesh > On Sep 16, 2016, at 8:55 AM, Vitalii Diravka> wrote: > > Hi all! > > I am going to add one new option and it looks like I can use it only at the > system level (Metadata class). > > I saw this task https://issues.apache.org/jira/browse/DRILL-3363. > Does it mean that only system-wide-variables could be used in drill > (without appropriate session options)? > > > Kind regards > Vitalii
System/session options
Hi all! I am going to add one new option and it looks like I can use it only at the system level (Metadata class). I saw this task https://issues.apache.org/jira/browse/DRILL-3363. Does it mean that only system-wide-variables could be used in drill (without appropriate session options)? Kind regards Vitalii
[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/574#discussion_r79155363 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java --- @@ -186,4 +226,105 @@ public boolean isFunctionComplexOutput(String name) { return false; } + public RemoteFunctionRegistry getRemoteFunctionRegistry() { +return remoteFunctionRegistry; + } + + public List validate(Path path) throws IOException { +URL url = path.toUri().toURL(); +URL[] urls = {url}; +ClassLoader classLoader = new URLClassLoader(urls); +return drillFuncRegistry.validate(path.getName(), scan(classLoader, path, urls)); + } + + public void register(String jarName, ScanResult classpathScan, ClassLoader classLoader) { +drillFuncRegistry.register(jarName, classpathScan, classLoader); + } + + public void unregister(String jarName) { +drillFuncRegistry.unregister(jarName); + } + + /** + * Loads all missing functions from remote registry. + * Compares list of already registered jars and remote jars, loads missing jars. + * Missing jars are stores in local DRILL_UDF_DIR. + * + * @return true if at least functions from one jar were loaded + */ + public boolean loadRemoteFunctions() { +List missingJars = Lists.newArrayList(); +Registry registry = remoteFunctionRegistry.getRegistry(); + +List localJars = drillFuncRegistry.getAllJarNames(); +for (Jar jar : registry.getJarList()) { + if (!localJars.contains(jar.getName())) { +missingJars.add(jar.getName()); + } +} + +for (String jarName : missingJars) { + try { +Path localUdfArea = new Path(new File(getUdfDir()).toURI()); --- End diff -- Agree, I have already removed creation from sh script to Drill. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---