[jira] [Updated] (DRILL-5587) Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option

2017-06-14 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5587:
---
  Labels: ready-to-commit  (was: )
Reviewer: Paul Rogers

> Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option
> -
>
> Key: DRILL-5587
> URL: https://issues.apache.org/jira/browse/DRILL-5587
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> We can set Parquet blockSize, pageSize and dictionary pageSize to any value. 
> It uses LongValidator which is not exactly validating the value. Since all 
> these sizes are used as int in the code, even though user is able to set them 
> to any value (could be greater than MAXINT and/or negative), parsing the 
> value later in the code as int can throw an error. Instead, restrict the 
> value that can be set to MAXINT. 
> There is a bug open for validating system/session options in general. 
> https://issues.apache.org/jira/browse/DRILL-2478



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5519) Sort fails to spill and results in an OOM

2017-06-14 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050026#comment-16050026
 ] 

Paul Rogers commented on DRILL-5519:


The problem is a bit more subtle. Consider the original error and long message:

{code}
ExternalSortBatch - Available memory: 2000
...
Unable to allocate buffer of size 4194304 (rounded from 320) due to memory 
limit. Current allocation: 16015936
{code}

The problem here is the rounding. Let's do the math:

{code}
Memory: 20,000,000
- In use: 16,015,936
- Buffer size: 3,200,000
Net: 784,064
{code}

What the code did not consider is the rounding of 320 to 4194304. That 
pushes us over the memory limit. So, the fix is easy: use the rounded memory in 
the calculations rather than the unrounded amount.

> Sort fails to spill and results in an OOM
> -
>
> Key: DRILL-5519
> URL: https://issues.apache.org/jira/browse/DRILL-5519
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 26e49afc-cf45-637b-acc1-a70fee7fe7e2.sys.drill, 
> drillbit.log, drillbit.out, drill-env.sh
>
>
> Setup :
> {code}
> git.commit.id.abbrev=1e0a14c
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> No of nodes in the drill cluster : 1
> {code}
> The below query fails with an OOM in the "in-memory sort" code, which means 
> the logic which decides when to spill is flawed.
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | exec.sort.disable_managed updated.  |
> +---+-+
> 1 row selected (1.022 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 334288000;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +---++
> 1 row selected (0.369 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
> (select flatten(flatten(lst_lst)) num from 
> dfs.`/drill/testdata/resource-manager/nested-large.json`) d order by d.num) 
> d1 where d1.num < -1;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Unable to allocate buffer of size 4194304 (rounded from 320) due to 
> memory limit. Current allocation: 16015936
> Fragment 2:2
> [Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Below is the exception from the logs
> {code}
> 2017-05-16 13:46:33,233 [26e49afc-cf45-637b-acc1-a70fee7fe7e2:frag:2:2] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes 
> ran out of memory while executing the query. (Unable to allocate buffer of 
> size 4194304 (rounded from 320) due to memory limit. Current allocation: 
> 16015936)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 4194304 (rounded from 320) due to 
> memory limit. Current allocation: 16015936
> [Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_111]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_111]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 4194304 (rounded from 320) due to memory limit. 
> Current allocation: 16015936
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 

[jira] [Updated] (DRILL-4807) ORDER BY aggregate function in window definition results in AssertionError: Internal error: invariant violated: conversion result not null

2017-06-14 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-4807:
--
Affects Version/s: 1.10.0

> ORDER BY aggregate function in window definition results in AssertionError: 
> Internal error: invariant violated: conversion result not null
> --
>
> Key: DRILL-4807
> URL: https://issues.apache.org/jira/browse/DRILL-4807
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0, 1.10.0
>Reporter: Khurram Faraaz
>  Labels: window_function
>
> This seems to be a problem when regular window function queries, when 
> aggregate function is used in ORDER BY clause inside the window definition.
> MapR Drill 1.8.0 commit ID : 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT col0, SUM(col0) OVER ( PARTITION BY col7 
> ORDER BY MIN(col8)) avg_col0, col7 FROM `allTypsUniq.parquet` GROUP BY 
> col0,col8,col7;
> Error: SYSTEM ERROR: AssertionError: Internal error: invariant violated: 
> conversion result not null
> [Error Id: 19a3eced--4e83-ae0f-6b8ea21b2afd on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT col0, AVG(col0) OVER ( PARTITION BY col7 
> ORDER BY MIN(col8)) avg_col0, col7 FROM `allTypsUniq.parquet` GROUP BY 
> col0,col8,col7;
> Error: SYSTEM ERROR: AssertionError: Internal error: invariant violated: 
> conversion result not null
> [Error Id: c9b7ebf2-6097-41d8-bb73-d57da4ace8ad on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-07-26 09:26:16,717 [2868d347-3124-0c58-89ff-19e4ee891031:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 2868d347-3124-0c58-89ff-19e4ee891031: SELECT col0, AVG(col0) OVER ( PARTITION 
> BY col7 ORDER BY MIN(col8)) avg_col0, col7 FROM `allTypsUniq.parquet` GROUP 
> BY col0,col8,col7
> 2016-07-26 09:26:16,751 [2868d347-3124-0c58-89ff-19e4ee891031:foreman] ERROR 
> o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: AssertionError: Internal 
> error: invariant violated: conversion result not null
> [Error Id: c9b7ebf2-6097-41d8-bb73-d57da4ace8ad on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> AssertionError: Internal error: invariant violated: conversion result not null
> [Error Id: c9b7ebf2-6097-41d8-bb73-d57da4ace8ad on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:791)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:901) 
> [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:271) 
> [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: invariant violated: 
> conversion result not null
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: invariant violated: 
> conversion result not null
> at org.apache.calcite.util.Util.newInternal(Util.java:777) 
> ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at org.apache.calcite.util.Util.permAssert(Util.java:885) 
> ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4063)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertSortExpression(SqlToRelConverter.java:4080)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertOver(SqlToRelConverter.java:1783)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.access$1100(SqlToRelConverter.java:185)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4055)
>  ~[calcite-core-1.4.0-drill-r14.jar:

[jira] [Commented] (DRILL-5519) Sort fails to spill and results in an OOM

2017-06-14 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049806#comment-16049806
 ] 

Paul Rogers commented on DRILL-5519:


This query limits sort memory to only 20 MB (decimal). The memory merger 
component tries to allocate an SV4 in the {{setup()}} method. This is what 
fails.

The problem is that the sort memory planning did not set aside memory for the 
SV4 when figuring out whether it can do an in-memory sort.

> Sort fails to spill and results in an OOM
> -
>
> Key: DRILL-5519
> URL: https://issues.apache.org/jira/browse/DRILL-5519
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 26e49afc-cf45-637b-acc1-a70fee7fe7e2.sys.drill, 
> drillbit.log, drillbit.out, drill-env.sh
>
>
> Setup :
> {code}
> git.commit.id.abbrev=1e0a14c
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> No of nodes in the drill cluster : 1
> {code}
> The below query fails with an OOM in the "in-memory sort" code, which means 
> the logic which decides when to spill is flawed.
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | exec.sort.disable_managed updated.  |
> +---+-+
> 1 row selected (1.022 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 334288000;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +---++
> 1 row selected (0.369 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
> (select flatten(flatten(lst_lst)) num from 
> dfs.`/drill/testdata/resource-manager/nested-large.json`) d order by d.num) 
> d1 where d1.num < -1;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Unable to allocate buffer of size 4194304 (rounded from 320) due to 
> memory limit. Current allocation: 16015936
> Fragment 2:2
> [Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Below is the exception from the logs
> {code}
> 2017-05-16 13:46:33,233 [26e49afc-cf45-637b-acc1-a70fee7fe7e2:frag:2:2] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes 
> ran out of memory while executing the query. (Unable to allocate buffer of 
> size 4194304 (rounded from 320) due to memory limit. Current allocation: 
> 16015936)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 4194304 (rounded from 320) due to 
> memory limit. Current allocation: 16015936
> [Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_111]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_111]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 4194304 (rounded from 320) due to memory limit. 
> Current allocation: 16015936
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.MSorterGen44.setup(MSortTemplate.java:91)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.MergeSort.merge(MergeSort.java:110)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.Ex

[jira] [Created] (DRILL-5589) JDBC client crashes after successful authentication if trace logging is enabled.

2017-06-14 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-5589:


 Summary: JDBC client crashes after successful authentication if 
trace logging is enabled.
 Key: DRILL-5589
 URL: https://issues.apache.org/jira/browse/DRILL-5589
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia


When authentication is completed then with latest changes we [dispose the 
saslClient instance | 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/AuthenticationOutcomeListener.java#L295]
 if encryption is not enabled. Then later in caller we try to [log the 
mechanism name | 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/security/AuthenticationOutcomeListener.java#L136]
 using saslClient instance with trace level logging. This will cause the client 
to crash since the saslClient instance is already disposed before logging. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5588) Hash Aggregate: Avoid copy on output of aggregate columns

2017-06-14 Thread Boaz Ben-Zvi (JIRA)
Boaz Ben-Zvi created DRILL-5588:
---

 Summary: Hash Aggregate: Avoid copy on output of aggregate columns
 Key: DRILL-5588
 URL: https://issues.apache.org/jira/browse/DRILL-5588
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Boaz Ben-Zvi


 When the Hash Aggregate operator outputs its result batches downstream, the 
key columns (value vectors) are returned as is, but for the aggregate columns 
new value vectors are allocated and the values are copied. This has an impact 
on performance. (see the method allocateOutgoing() ). A second effect is on 
memory management (as this allocation is not planned for by the code that 
controls spilling, etc).
   For some simple aggregate functions (e.g. SUM), the stored value vectors for 
the aggregate values can be returned as is. For functions like AVG, there is a 
need to divide the SUM values by the COUNT values. Still this can be done 
in-place (of the SUM values) and avoid new allocation and copy. 
   For VarChar type aggregate values (only used by MAX or MIN), there is 
another issue -- currently any such value vector is allocated as an 
ObjectVector (see BatchHolder()) (and on the JVM heap, not in direct memory). 
This is to manage the sizes of the values, which could change as the 
aggregation progresses (e.g., for MAX(name) -- first record has 'abe', but the 
next record has 'benjamin' which is both bigger ('b' > 'a') and longer). For 
the final output, this requires a new allocation and a copy in order to have a 
compact value vector in direct memory. Maybe the ObjectVector could be replaced 
with some direct memory implementation that is optimized for "good" values 
(e.g., all are of similar size), but penalized "bad" values (e.g., reallocates 
or moves values, when needed) ?






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5568) Include hadoop-common jars inside drill-jdbc-all.jar

2017-06-14 Thread Sorabh Hamirwasia (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-5568:
-
Labels: ready-to-commit  (was: )

> Include hadoop-common jars inside drill-jdbc-all.jar
> 
>
> Key: DRILL-5568
> URL: https://issues.apache.org/jira/browse/DRILL-5568
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: ready-to-commit
>
> With Sasl support in 1.10 the authentication using username/password was 
> moved to Plain Mechanism of Sasl Framework. There are couple of Hadoop 
> classes like Configuration.java and UserGroupInformation.java defined in 
> hadoop-common package which were used in DrillClient for security mechanisms 
> like Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency 
> inside _drill-jdbc-all.jar_  Without it the application using this driver 
> will fail to connect to Drill with authentication enabled.
> Today this jar (which is JDBC driver for Drill) already has lots of other 
> dependencies which DrillClient relies on like Netty, etc. But the way we add 
> these dependencies are under *oadd* namespace so that the application using 
> this driver won't end up in conflict with it's own version of same 
> dependencies. As part of this JIRA it will include hadoop-common dependencies 
> under same namespace. This will allow an application to connect to Drill 
> using this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)