[jira] [Created] (DRILL-4895) StreamingAggBatch code generation issues

2016-09-16 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-4895:
-

 Summary: StreamingAggBatch code generation issues
 Key: DRILL-4895
 URL: https://issues.apache.org/jira/browse/DRILL-4895
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.7.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai


We unnecessarily re-generate the code for the StreamingAggBatch even without 
schema changes. Also, we seem to generate many holder variables than what maybe 
required. This also affects sub-classes. HashAggBatch does not have the same 
issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request #588: Added test cases

2016-09-16 Thread gparai
GitHub user gparai opened a pull request:

https://github.com/apache/drill/pull/588

Added test cases

Added testcases to verify plans and run the same for the group-by and non 
group-by cases.  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gparai/drill Drill-4771-ADM

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/588.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #588


commit 6dbf9dd8def93b9200f941e79d6a79f8a3551cd3
Author: Gautam Parai 
Date:   2016-09-13T03:21:46Z

Added test cases




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #587: DRILL-4894: Fix unit test failure in 'storage-hive/core' m...

2016-09-16 Thread gparai
Github user gparai commented on the issue:

https://github.com/apache/drill/pull/587
  
+1, unit tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...

2016-09-16 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/585#discussion_r79267883
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
@@ -592,11 +592,14 @@ public BatchGroup 
mergeAndSpill(LinkedList batchGroups) throws Schem
   }
   injector.injectChecked(context.getExecutionControls(), 
INTERRUPTION_WHILE_SPILLING, IOException.class);
   newGroup.closeOutputStream();
-} catch (Exception e) {
+} catch (Throwable e) {
   // we only need to cleanup newGroup if spill failed
-  AutoCloseables.close(e, newGroup);
+  try {
+AutoCloseables.close(e, newGroup);
+  } catch (Throwable t) { /* close() may hit the same IO issue; just 
ignore */ }
--- End diff --

The root cause for the whole bug is in Hadoop's RawLocalFileSystem.java:

package org.apache.hadoop.fs;
.
public void write(byte[] b, int off, int len) throws IOException {
  try {
fos.write(b, off, len);
  } catch (IOException e) {// unexpected exception
throw new FSError(e);  // assume native fs error
  }
}

And FSError is not a subclass of IOException !!!  

java.lang.Object
java.lang.Throwable
java.lang.Error
org.apache.hadoop.fs.FSError

So the only common ancestor is Throwable .  And any part in the drill code 
that catches only IOException will not catch !!





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #587: DRILL-4894: Fix unit test failure in 'storage-hive/...

2016-09-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/587


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #587: DRILL-4894: Fix unit test failure in 'storage-hive/core' m...

2016-09-16 Thread adityakishore
Github user adityakishore commented on the issue:

https://github.com/apache/drill/pull/587
  
I have verified that this does not alter the content of binary package.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #587: DRILL-4894: Fix unit test failure in 'storage-hive/core' m...

2016-09-16 Thread chunhui-shi
Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/587
  
+1, unit test passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...

2016-09-16 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/585#discussion_r79255636
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
@@ -592,11 +592,14 @@ public BatchGroup 
mergeAndSpill(LinkedList batchGroups) throws Schem
   }
   injector.injectChecked(context.getExecutionControls(), 
INTERRUPTION_WHILE_SPILLING, IOException.class);
   newGroup.closeOutputStream();
-} catch (Exception e) {
+} catch (Throwable e) {
   // we only need to cleanup newGroup if spill failed
-  AutoCloseables.close(e, newGroup);
+  try {
+AutoCloseables.close(e, newGroup);
+  } catch (Throwable t) { /* close() may hit the same IO issue; just 
ignore */ }
--- End diff --

In the case of no disk space to spill, close() tries to cleanup by calling 
flushBuffer() which eventually throws the same exception as there's still no 
space:

at java.io.FileOutputStream.write(FileOutputStream.java:326)
  at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:246)
  at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
  - locked <0x24e5> (a java.io.BufferedOutputStream)
  at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
  at java.io.DataOutputStream.write(DataOutputStream.java:107)
  - locked <0x24e7> (a org.apache.hadoop.fs.FSDataOutputStream)
  at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:419)
  at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:206)
  at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:163)
  - locked <0x24e8> (a 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer)
  at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:144)
  at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:407)
  at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
  at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
  at 
org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:169)
  at 
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76)
  at 
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:53)
  at 
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:43)
  at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:598)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #587: DRILL-4894: Fix unit test failure in 'storage-hive/...

2016-09-16 Thread adityakishore
GitHub user adityakishore opened a pull request:

https://github.com/apache/drill/pull/587

DRILL-4894: Fix unit test failure in 'storage-hive/core' module

Exclude 'hadoop-mapreduce-client-core' and 'hadoop-auth' as transitive 
dependencies from 'hbase-server'

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adityakishore/drill DRILL-4894

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/587.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #587


commit f3c26e34e3a72ef338c4dbca1a0204f342176972
Author: Aditya Kishore 
Date:   2016-09-16T19:14:35Z

DRILL-4894: Fix unit test failure in 'storage-hive/core' module

Exclude 'hadoop-mapreduce-client-core' and 'hadoop-auth' as transitive 
dependencies from 'hbase-server'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: System/session options

2016-09-16 Thread Vitalii Diravka
Looks like there is no way to get SessionOptionManager in Metadata
class from anywhere.
The question isn't actual. I will take a look to store the option in
ParquetPluginConfig.

Thanks, Sudheesh.

Kind regards
Vitalii

2016-09-16 19:11 GMT+03:00 Sudheesh Katkam :

> Can you provide more details about your case?
>
> DRILL-3363 requests for a nice error message for options that cannot be
> set at session level (there is no handle to a UserSession in some cases
> e.g. function registry). AFAIK currently, such statements are no ops.
>
> Thank you,
> Sudheesh
>
> > On Sep 16, 2016, at 8:55 AM, Vitalii Diravka 
> wrote:
> >
> > Hi all!
> >
> > I am going to add one new option and it looks like I can use it only at
> the
> > system level (Metadata class).
> >
> > I saw this task https://issues.apache.org/jira/browse/DRILL-3363.
> > Does it mean that only system-wide-variables could be used in drill
> > (without appropriate session options)?
> >
> >
> > Kind regards
> > Vitalii
>
>


Drill with Proto Buffers or Apache Thrift

2016-09-16 Thread Pradeeban Kathiravelu
Hi,
We are evaluating Drill for data with multi-dimensional array. We like to
keep the overhead low. So we decided against using flatten() to query the
multi-dimensional array. Similarly using the indices to refer to the array
elements is simply infeasible as our array is dynamic and we will not know
the number of elements present in the array (the array represents the
coordinates in a geojson).

We are evaluating the potentials for using Proto Buffers to serialize the
multi-dimensional array first before querying the data with Drill. So
avoiding the error
" *Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type
LIST"*

Pls note that while our query results include these arrays (as in "select
*"), we are not querying the array itself with Drill. Rather, we are
querying the other attributes associated with in the same object. Hence it
is theoretically possible to query while the array remains serialized. Our
data is originally in the format of a JSON, hence the complex structure.

However, we have some questions on the architectural feasibility without
draining the performance of Drill and Proto Buffers. It is no doubt that
both are highly performing. However, we are skeptical about the use of them
combined.

Is there any development effort on serialization with Protocol Buffers
and/or Apache Thrift? Any storage plugins developed, or similar deployment
architectures,as in:
*Data with multi-dimensional array -> Data with the multi-dimensional array
serialized with Protocol Buffers -> Query with Drill -> Deserialize the
multi-dimensional arrays in the query results back with Protocol Buffers* ?

Pls share your thoughts on this (whether you have attempted this, or is
there something that I am failing to see).

We have also tried other alternatives such as using CTAS and also a
potential to just modify the data source schema from multi-dimensional
arrays to a map [1]. We do not mind the initial performance hit of
conversions. This is just a one-time cost. What matters is the consequent
read queries - they should be efficient and fast, as in using Drill when
multi-dimensional arrays are not included.

[1] http://kkpradeeban.blogspot.com/search/label/Drill

Thank you.
Regards,
Pradeeban.
-- 
Pradeeban Kathiravelu.
PhD Researcher, Erasmus Mundus Joint Doctorate in Distributed Computing,
INESC-ID Lisboa / Instituto Superior Técnico, Universidade de Lisboa,
Portugal.
Biomedical Informatics Software Engineer, Emory University School of
Medicine.

Blog: [Llovizna] http://kkpradeeban.blogspot.com/
LinkedIn: www.linkedin.com/pub/kathiravelu-pradeeban/12/b6a/b03


[jira] [Created] (DRILL-4894) Fix unit test failure in 'storage-hive/core' module

2016-09-16 Thread Aditya Kishore (JIRA)
Aditya Kishore created DRILL-4894:
-

 Summary: Fix unit test failure in 'storage-hive/core' module
 Key: DRILL-4894
 URL: https://issues.apache.org/jira/browse/DRILL-4894
 Project: Apache Drill
  Issue Type: Bug
Reporter: Aditya Kishore
Assignee: Aditya Kishore


As part of DRILL-4886, I added `hbase-server` as a dependency for 
'storage-hive/core' which pulled older version (2.5.1) of some hadoop jars, 
incompatible with other hadoop jars used by drill (2.7.1).

This breaks unit tests in this module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: System/session options

2016-09-16 Thread Sudheesh Katkam
Can you provide more details about your case?

DRILL-3363 requests for a nice error message for options that cannot be set at 
session level (there is no handle to a UserSession in some cases e.g. function 
registry). AFAIK currently, such statements are no ops.

Thank you,
Sudheesh

> On Sep 16, 2016, at 8:55 AM, Vitalii Diravka  
> wrote:
> 
> Hi all!
> 
> I am going to add one new option and it looks like I can use it only at the
> system level (Metadata class).
> 
> I saw this task https://issues.apache.org/jira/browse/DRILL-3363.
> Does it mean that only system-wide-variables could be used in drill
> (without appropriate session options)?
> 
> 
> Kind regards
> Vitalii



System/session options

2016-09-16 Thread Vitalii Diravka
Hi all!

I am going to add one new option and it looks like I can use it only at the
system level (Metadata class).

I saw this task https://issues.apache.org/jira/browse/DRILL-3363.
Does it mean that only system-wide-variables could be used in drill
(without appropriate session options)?


Kind regards
Vitalii


[GitHub] drill pull request #574: DRILL-4726: Dynamic UDFs support

2016-09-16 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/574#discussion_r79155363
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java
 ---
@@ -186,4 +226,105 @@ public boolean isFunctionComplexOutput(String name) {
 return false;
   }
 
+  public RemoteFunctionRegistry getRemoteFunctionRegistry() {
+return remoteFunctionRegistry;
+  }
+
+  public List validate(Path path) throws IOException {
+URL url = path.toUri().toURL();
+URL[] urls = {url};
+ClassLoader classLoader = new URLClassLoader(urls);
+return drillFuncRegistry.validate(path.getName(), scan(classLoader, 
path, urls));
+  }
+
+  public void register(String jarName, ScanResult classpathScan, 
ClassLoader classLoader) {
+drillFuncRegistry.register(jarName, classpathScan, classLoader);
+  }
+
+  public void unregister(String jarName) {
+drillFuncRegistry.unregister(jarName);
+  }
+
+  /**
+   * Loads all missing functions from remote registry.
+   * Compares list of already registered jars and remote jars, loads 
missing jars.
+   * Missing jars are stores in local DRILL_UDF_DIR.
+   *
+   * @return true if at least functions from one jar were loaded
+   */
+  public boolean loadRemoteFunctions() {
+List missingJars = Lists.newArrayList();
+Registry registry = remoteFunctionRegistry.getRegistry();
+
+List localJars = drillFuncRegistry.getAllJarNames();
+for (Jar jar : registry.getJarList()) {
+  if (!localJars.contains(jar.getName())) {
+missingJars.add(jar.getName());
+  }
+}
+
+for (String jarName : missingJars) {
+  try {
+Path localUdfArea = new Path(new File(getUdfDir()).toURI());
--- End diff --

Agree, I have already removed creation from sh script to Drill.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---