[jira] [Assigned] (DRILL-1131) Drill should ignore files in starting with . _

2017-04-27 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-1131:
---

Assignee: Pritesh  (was: Zelaine Fong)

> Drill should ignore files in starting with . _
> --
>
> Key: DRILL-1131
> URL: https://issues.apache.org/jira/browse/DRILL-1131
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Parquet
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Pritesh
> Fix For: Future
>
>
> Files containing . and _ as the first characters are ignored by hive and 
> others are these are typically logs and status files written out by tools 
> like mapreduce. Drill should not read them when querying a directory 
> containing a list of parquet files.
> Currently it fails with the error:
> message: "Failure while setting up Foreman. < AssertionError:[ Internal 
> error: Error while applying rule DrillPushProjIntoScan, args 
> [rel#78:ProjectRel.NONE.ANY([]).[](child=rel#15:Subset#1.ENUMERABLE.ANY([]).[],p_partkey=$1,p_type=$2),
>  rel#8:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[dfs, 
> drillTestDirDencTpchSF100, part])] ] < DrillRuntimeException:[ 
> java.io.IOException: Could not read footer: java.io.IOException: Could not 
> read footer for file com.mapr.fs.MapRFileStatus@99c9d45e ] < IOException:[ 
> Could not read footer: java.io.IOException: Could not read footer for file 
> com.mapr.fs.MapRFileStatus@99c9d45e ] < IOException:[ Could not read footer 
> for file com.mapr.fs.MapRFileStatus@99c9d45e ] < IOException:[ Open failed 
> for file: /drill/testdata/dencSF100/part/.impala_insert_staging, error: 
> Invalid argument (22) ]"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5450) Fix initcap function to convert upper case characters correctly

2017-04-27 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5450:

Reviewer: Vitalii Diravka

Assigned Reviewer to [~vitalii]

> Fix initcap function to convert upper case characters correctly
> ---
>
> Key: DRILL-5450
> URL: https://issues.apache.org/jira/browse/DRILL-5450
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>
> Initcap function converts incorrectly subsequent upper case characters after 
> first character.
> {noformat}
> 0: jdbc:drill:zk=local> select initcap('aaa') from (values(1));
> +-+
> | EXPR$0  |
> +-+
> | Aaa |
> +-+
> 1 row selected (0.275 seconds)
> 0: jdbc:drill:zk=local> select initcap('AAA') from (values(1));
> +-+
> | EXPR$0  |
> +-+
> | A!! |
> +-+
> 1 row selected (0.27 seconds)
> 0: jdbc:drill:zk=local> select initcap('aAa') from (values(1));
> +-+
> | EXPR$0  |
> +-+
> | A!a |
> +-+
> 1 row selected (0.229 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5391) CTAS: make folder and file permission configurable

2017-04-26 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5391:

Reviewer: Paul Rogers

Assigned Reviewer to [~Paul.Rogers]

> CTAS: make folder and file permission configurable
> --
>
> Key: DRILL-5391
> URL: https://issues.apache.org/jira/browse/DRILL-5391
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0, 1.10.0
> Environment: CentOS 7, HDP 2.4
>Reporter: Chua Tianxiang
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Attachments: Drill-1-10.PNG, Drill-1-9.PNG
>
>
> In Drill 1.9, CREATE TABLE AS creates a folder with permissions 777, while on 
> Drill 1.10, the same commands creates a folder with permission 775. Both 
> drills are started with root user, installed on the same servers and accesses 
> the same HDFS.
> Scope:
> Added new configuration option exec.persistent_table.umask which default to 
> 002.
> Default directory permission will be 775, file - 664.
> User can modify this option on session or system level.
> If umask was set incorrectly, default umask will be used (002) and error will 
> be logged.
> For example, if user wants to create table will full access to folders and 
> files, he needs to update umask:
> alter session set `exec.persistent_table.umask` = '000';
> In this case folders will be created with 777 permission, files with 666.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5419) Calculate return string length for literals & some string functions

2017-04-25 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5419:

Reviewer: Jinfeng Ni

Assigned Reviewer to [~jni]

> Calculate return string length for literals & some string functions
> ---
>
> Key: DRILL-5419
> URL: https://issues.apache.org/jira/browse/DRILL-5419
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Attachments: version_with_cast.JPG, version_without_cast.JPG
>
>
> Though Drill is schema-less and cannot determine in advance what the length 
> of the column should be but if query has an explicit type/length specified, 
> Drill should return correct column length.
> For example, JDBC / ODBC Driver is ALWAYS returning 64K as the length of a 
> varchar or char even if casts are applied.
> Changes:
> *LITERALS*
> String literals length is the same as actual literal length.
> Example: for 'aaa' return length is 3.
> *CAST*
> Return length is the one indicated in cast expression. This also applies when 
> user has created view where each string columns was casted to varchar with 
> some specific length.
> This length will be returned to the user without need to apply cast one more 
> time. Below mentioned functions can take leverage of underlying varchar 
> length and calculate return length.
> *LOWER, UPPER, INITCAP, REVERSE, FIRST_VALUE, LAST_VALUE* 
> Return length is underlying column length, if column is known, the same 
> length will be returned.
> Example:
> lower(cast(col as varchar(30))) will return 30.
> lower(col) will return max varchar length, since we don't know actual column 
> length.
> *LPAD, RPAD*
> Pads the string to the length specified. Return length is this specified 
> length. 
> *CONCAT, CONCAT OPERATOR (||)*
> Return length is sum of underlying columns length. If length is greater then 
> varchar max length,  varchar max length is returned.
> *SUBSTR, SUBSTRING, LEFT, RIGHT*
> Calculates return length according to each function substring rules, for 
> example, taking into account how many char should be substracted.
> *IF EXPRESSIONS (CASE STATEMENT, COALESCE), UNION OPERATOR*
> When combining string columns with different length, return length is max 
> from source columns.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5140) Fix CompileException in run-time generated code when record batch has large number of fields.

2017-04-24 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5140:

Reviewer: Jinfeng Ni

Assigned Reviewer to [~jni]

> Fix CompileException in run-time generated code when record batch has large 
> number of fields.
> -
>
> Key: DRILL-5140
> URL: https://issues.apache.org/jira/browse/DRILL-5140
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Attachments: drill_5117.q, manyColumns.csv
>
>
> CTAS that does SELECT over 5003 columns fails with CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject...
> Drill 1.9.0 git commit ID : 4c1b420b
> CTAS statement and CSV data file are attached.
> I ran test with and without setting the below system option, test failed in 
> both cases.
> alter system set `exec.java_compiler`='JDK';
> sqlline session just closes with below message, after the failing CTAS is 
> executed.
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> Stack trace from drillbit.log
> {noformat}
> 2016-12-20 12:02:16,016 [27a6e241-99b1-1f2a-8a91-394f8166e969:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[ProjectorGen45.java]', 
> Line 11, Column 8: ProjectorGen45.java:11: error: too many constants
> public class ProjectorGen45 {
>^ (compiler.err.limit.pool)
> Fragment 0:0
> [Error Id: ced84dce-669d-47c2-b5d2-5e0559dbd9fd on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[ProjectorGen45.java]', 
> Line 11, Column 8: ProjectorGen45.java:11: error: too many constants
> public class ProjectorGen45 {
>^ (compiler.err.limit.pool)
> Fragment 0:0
> [Error Id: ced84dce-669d-47c2-b5d2-5e0559dbd9fd on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.9.0.jar:1.9.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.exec.exception.SchemaChangeException: Failure 
> while attempting to load generated class
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:487)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
>

[jira] [Assigned] (DRILL-5435) Using Limit causes Memory Leaked Error since 1.10

2017-04-13 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5435:
---

Assignee: Parth Chandra

> Using Limit causes Memory Leaked Error since 1.10
> -
>
> Key: DRILL-5435
> URL: https://issues.apache.org/jira/browse/DRILL-5435
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: F Méthot
>Assignee: Parth Chandra
>
> Here is the details I can provide:
>   We migrated our production system from Drill 1.9 to 1.10 just 5 days ago. 
> (220 nodes cluster)
> Our log show there was some 900+ queries ran without problem in first 4 days. 
>  (similar queries, that never use the `limit` clause)
> Yesterday we started doing simple adhoc select * ... limit 10 queries (like 
> we often do, that was our first use of limit with 1.10)
> and we got a `Memory was leaked` exception below.
> Also, once we get the error, Most of all subsequent user queries fails with 
> Channel Close Exception. We need to restart Drill to bring it back to normal.
> A day later, I used a similar select * limit 10 queries, and the same thing 
> happen, had to restart Drill.
> In the exception, it was refering to a file (1_0_0.parquet)
> I moved that file to smaller test cluster (12 nodes) and got the error on the 
> first attempt. but I am no longer able to reproduce the issue on that file. 
> Between the 12 and 220 nodes cluster, a different Column name and Row Group 
> Start was listed in the error.
> The parquet file was generated by Drill 1.10.
> I tried the same file with a local drill-embedded 1.9 and 1.10 and had no 
> issue.
> Here is the error (manually typed), if you think of anything obvious, let us 
> know.
> AsyncPageReader - User Error Occured: Exception Occurred while reading from 
> disk (can not read class o.a.parquet.format.PageHeader: java.io.IOException: 
> input stream is closed.)
> File:/1_0_0.parquet
> Column: StringColXYZ
> Row Group Start: 115215476
> [Error Id: ]
>   at UserException.java:544)
>   at 
> o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsynvPageReader.java:199)
>   at 
> o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.access(AsynvPageReader.java:81)
>   at 
> o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:483)
>   at 
> o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:392)
>   at 
> o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageReaderTask.call(AsyncPageReader.java:392)
> ...
> Caused by: java.io.IOException: can not read class 
> org.apache.parquet.format.PageHeader: java.io.IOException: Input Stream is 
> closed.
>at o.a.parquet.format.Util.read(Util.java:216)
>at o.a.parquet.format.Util.readPageHeader(Util.java:65)
>at 
> o.a.drill.exec.store.parquet.columnreaders.AsyncPageReader(AsyncPageReaderTask:430)
> Caused by: parquet.org.apache.thrift.transport.TTransportException: Input 
> stream is closed
>at ...read(TIOStreamTransport.java:129)
>at TTransport.readAll(TTransport.java:84)
>at TCompactProtocol.readByte(TCompactProtocol.java:474)
>at TCompactProtocol.readFieldBegin(TCompactProtocol.java:481)
>at InterningProtocol.readFieldBegin(InterningProtocol.java:158)
>at o.a.parquet.format.PageHeader.read(PageHeader.java:828)
>at o.a.parquet.format.Util.read(Util.java:213)
> Fragment 0:0
> [Error id: ...]
> o.a.drill.common.exception.UserException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (524288)
> Allocator(op:0:0:4:ParquetRowGroupScan) 100/524288/39919616/100
>   at o.a.d.common.exceptions.UserException (UserException.java:544)
>   at 
> o.a.d.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
>   at o.a.d.exec.work.fragment.FragmentExecutor.cleanup( 
> FragmentExecutor.java:160)
>   at o.a.d.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
> ...
> Caused by: IllegalStateException: Memory was leaked by query. Memory leaked: 
> (524288)
>   at o.a.d.exec.memory.BaseAllocator.close(BaseAllocator.java:502)
>   at o.a.d.exec.ops.OperatorContextImpl(OperatorContextImpl.java:149)
>   at o.a.d.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:422)
>   at o.a.d.exec.ops.FragmentContext.close(FragmentContext.java:411)
>   at 
> o.a.d.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:318)
>   at 
> o.a.d.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)
> This fixed the problem:
> alter  set `store.parquet.reader.pagereader.async`=false;



--
This message was sent by Atlassian JIRA

[jira] [Assigned] (DRILL-5429) Cache tableStats per query for MapR DB JSON Tables

2017-04-12 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5429:
---

Assignee: Padma Penumarthy
Reviewer: Gautam Kumar Parai

Assigned Reviewer to [~gparai]

> Cache tableStats per query for MapR DB JSON Tables
> --
>
> Key: DRILL-5429
> URL: https://issues.apache.org/jira/browse/DRILL-5429
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.10.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.11.0
>
>
> For MapR DB JSON Tables, cache (per query) and reuse tableStats. Getting 
> tableStats is an expensive operation. Saving it and reusing it helps reduce 
> query latency. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5428) submit_plan fails after Drill 1.8 script revisions

2017-04-11 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5428:

Reviewer: Chunhui Shi

Assigned Reviewer to [~cshi]

> submit_plan fails after Drill 1.8 script revisions
> --
>
> Key: DRILL-5428
> URL: https://issues.apache.org/jira/browse/DRILL-5428
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Drill provides the script {{submit_plan}} to submit a logical or physical 
> plan for execution. It was used often early in Drill's history, but is seldom 
> used now. It's primary use is when testing, to submit a plan received from 
> {{EXPLAIN PLAN}}.
> The 1.8 release of Drill revised the launch scripts. That revision proposed 
> dropping {{submit_plan}} from the released distribution. (DRILL-4752). In the 
> end, we left the script in the distribution, but it was not tested.
> To make the script work, it needs a six-character tweak to reflect the 
> changes made in other scripts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5424) Fix IOBE for reverse function

2017-04-11 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5424:

Reviewer: Arina Ielchiieva

Assigned Reviewer to [~arina]

> Fix IOBE for reverse function
> -
>
> Key: DRILL-5424
> URL: https://issues.apache.org/jira/browse/DRILL-5424
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Minor
>
> Query with reverse function fails:
> {code:sql}
> 0: jdbc:drill:zk=local> select reverse(a) from dfs.`/tmp/test.json`;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 259, length: 1 
> (expected: range(0, 256))
> {code}
> for table with several long varchars.
> {noformat}
> cat /tmp/test.json
> {"a": "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"}
> {"a": "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"}
> {"a": "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"}
> {"a": "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"}
> {"a": "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"}
> {noformat}
> The same query works for the table with less row number:
> {code:sql}
> 0: jdbc:drill:zk=local> select reverse(a) from dfs.`/tmp/test2.json`;
> +---+
> |EXPR$0 |
> +---+
> | zyxwvutsrqponmlkjihgfedcbazyxwvutsrqponmlkjihgfedcba  |
> | zyxwvutsrqponmlkjihgfedcbazyxwvutsrqponmlkjihgfedcba  |
> | zyxwvutsrqponmlkjihgfedcbazyxwvutsrqponmlkjihgfedcba  |
> | zyxwvutsrqponmlkjihgfedcbazyxwvutsrqponmlkjihgfedcba  |
> +---+
> {code}
> {noformat}
> cat /tmp/test2.json
> {"a": "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"}
> {"a": "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"}
> {"a": "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"}
> {"a": "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5213) Prepared statement for actual query is missing the query text

2017-04-10 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5213:
---

Assignee: Vitalii Diravka
Reviewer: Arina Ielchiieva

Assigned Reviewer to [~arina]

> Prepared statement for actual query is missing the query text
> -
>
> Key: DRILL-5213
> URL: https://issues.apache.org/jira/browse/DRILL-5213
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.10.0
>Reporter: Krystal
>Assignee: Vitalii Diravka
>
> Prepared statement for actual query is missing the query text in the query's 
> profile.  As a result, there is no link for the query profile from the UI.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5125) Provide option to use generic code for sv remover

2017-04-06 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5125:

Labels:   (was: ready-to-commit)

> Provide option to use generic code for sv remover
> -
>
> Key: DRILL-5125
> URL: https://issues.apache.org/jira/browse/DRILL-5125
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Consider a non-typical Drill query: one with 6000 rows but 243 fields. 
> Consider this query:
> {code}
> select * from (select *, row_number() over(order by somedate) as rn from 
> dfs.`/some/path/data.json`) where rn=10
> {code}
> This produces a query with the following structure:
> {code}
> 00-00Screen
> 00-01  ProjectAllowDup(*=[$0], rn=[$1])
> 00-02Project(T0¦¦*=[$0], w0$o0=[$2])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($2, 10)])
> 00-05  Window(window#0=[window(partition {} order by [1] rows 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
> 00-06SelectionVectorRemover
> 00-07  Sort(sort0=[$1], dir0=[ASC])
> 00-08Project(T0¦¦*=[$0], validitydate=[$1])
> 00-09  Scan(groupscan=...)
> {code}
> Instrumenting, the code to measure compile time, two “long poles” stood out:
> {code}
> Compile Time for org.apache.drill.exec.test.generated.CopierGen3: 500
> Compile Time for org.apache.drill.exec.test.generated.CopierGen8: 1659
> {code}
> Much of the initial run time of 5578 ms is taken up in compiling two classes 
> (2159 ms).
> The classes themselves are very simple: create member variables for 486 
> vectors (2 x column count), and call a method on each to do the copy. The 
> only type-specific work is the member variable and call to the (non-virtual) 
> CopyFrom or CopyFromSafe methods. The generated class can easily be replaced 
> by a “generic” class and virtual functions in the vector classes to choose 
> the correct copy method.
> Clearly, avoiding code gen means avoiding the compile times with a first-run 
> savings. Here are the last 8 runs (out of 10), with code cached turned off 
> (forcing a compile on each query run), with and without the generic versions:
> * Original (no code cache): 1832 ms / run
> * Generic (no code cache): 1317 ms / run
> This demonstrates the expected outcome: avoiding compilation of generated 
> code saves ~500 ms per run (or 28%). (Note: the numbers above were obtained 
> on a version of the code that already had various optimizations described in 
> other JIRA entries.)
> The reason, for generating code is that one would expect that 243 in-line 
> statements (an unwound loop) to be faster than a loop with 243 iterations. In 
> addition, the generic version uses an array in place of ~500 variables, and a 
> virtual function call rather than in-line, type-specific calls. One would 
> expect the unrolled loop to be faster.
> Repeat the exercise, this time with the code cache turned on so that no 
> compile cost is payed for either code path (because the test excludes the 
> first two runs in which the generated code is compiled.)
> * Original: 1302 ms / run
> * Generic version: 1040 ms / run
> Contrary to expectations, the loop is faster than the in-line statements. In 
> this instance, the array/loop/virtual function version is ~260 ms faster 
> (20%).
> The test shows that the code can be simplified, a costly costly code-gen and 
> compile step can be skipped, and this query will go faster. Plus, since the 
> change removes generated classes from the code cache, there is more room for 
> the remaining classes, which may improve the hit rate.
> This ticket offers the performance improvement as an option, described in 
> comments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4847) Window function query results in OOM Exception.

2017-04-06 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959354#comment-15959354
 ] 

Zelaine Fong commented on DRILL-4847:
-

[~khfaraaz] - I would keep the issue open until the new error is resolved, so 
we have a way of tracking that new error.  Or you could close this issue and 
open a new one to track the new error.

> Window function query results in OOM Exception.
> ---
>
> Key: DRILL-4847
> URL: https://issues.apache.org/jira/browse/DRILL-4847
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
>Priority: Critical
>  Labels: window_function
> Attachments: drillbit.log
>
>
> Window function query results in OOM Exception.
> Drill version 1.8.0-SNAPSHOT git commit ID: 38ce31ca
> MapRBuildVersion 5.1.0.37549.GA
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT clientname, audiencekey, spendprofileid, 
> postalcd, provincecd, provincename, postalcode_json, country_json, 
> province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER (PARTITION BY 
> spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 0 END) ASC, 
> provincecd ASC) as rn FROM `MD593.parquet` limit 3;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Failure while allocating buffer.
> Fragment 0:0
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-08-16 07:25:44,590 [284d4006-9f9d-b893-9352-4f54f9b1d52a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 284d4006-9f9d-b893-9352-4f54f9b1d52a: SELECT clientname, audiencekey, 
> spendprofileid, postalcd, provincecd, provincename, postalcode_json, 
> country_json, province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER 
> (PARTITION BY spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 
> ELSE 0 END) ASC, provincecd ASC) as rn FROM `MD593.parquet` limit 3
> ...
> 2016-08-16 07:25:46,273 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/284d4006-9f9d-b893-9352-4f54f9b1d52a_majorfragment0_minorfragment0_operator8/2
> 2016-08-16 07:25:46,283 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Failure while allocating buffer.
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
> while allocating buffer.
> at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:187)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:331)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:307)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector.getTransferPair(RepeatedMapVector.java:161)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.SimpleVectorWrapper.cloneAndTransfer(SimpleVectorWrapper.java:66)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.cloneAndTransfer(VectorContainer.java:204)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.getTransferClone(VectorContainer.java:157)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> 

[jira] [Updated] (DRILL-5385) Vector serializer fails to read saved SV2

2017-04-05 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5385:

Labels:   (was: ready-to-commit)

> Vector serializer fails to read saved SV2
> -
>
> Key: DRILL-5385
> URL: https://issues.apache.org/jira/browse/DRILL-5385
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Drill provides the {{VectorAccessibleSerializable}} class to write a record 
> batch to a stream, and to read that batch from a stream. Record batches can 
> carry an indirection vector (a so-called selection vector 2 or SV2).
> The code to write batches writes the SV2 to the stream. But, the code to 
> deserialize batches initializes, but does not read, the SV2 from the stream.
> The result is that vector deserialization reads the wrong bytes and the saved 
> values are corrupted on read.
> Note that this issue was found via unit testing. At present, the only 
> production use of this code is in the external sort, which serializes batches 
> without an indirection vector.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5413) DrillConnectionImpl.isReadOnly() throws NullPointerException

2017-04-05 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5413:

Reviewer: Jinfeng Ni

Assigned Reviewer to [~jni]

> DrillConnectionImpl.isReadOnly() throws NullPointerException
> 
>
> Key: DRILL-5413
> URL: https://issues.apache.org/jira/browse/DRILL-5413
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.10.0
> Environment: jboss 7.0.1 final version
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.11.0
>
>
> According to the 
> [CALCITE-843|https://issues.apache.org/jira/browse/CALCITE-843] every call of 
> "isReadonly()" throws NullPointerException. 
> For example, JBoss uses DrillConnectionImpl.isReadOnly() method in the 
> process of connection to the Drill as a datasource.
> The fix for CALCITE-843 should be added to the Drill Calcite fork.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4847) Window function query results in OOM Exception.

2017-04-05 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957024#comment-15957024
 ] 

Zelaine Fong commented on DRILL-4847:
-

[~khfaraaz] - those lines, I believe, are still the current sort.  
[~paul-rogers] - any idea why the old sort is still being used even though 
Khurram is using the new, managed sort?  Are there places where the planner is 
still generating plans using the old sort even with the new setting?

> Window function query results in OOM Exception.
> ---
>
> Key: DRILL-4847
> URL: https://issues.apache.org/jira/browse/DRILL-4847
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
>Priority: Critical
>  Labels: window_function
> Attachments: drillbit.log
>
>
> Window function query results in OOM Exception.
> Drill version 1.8.0-SNAPSHOT git commit ID: 38ce31ca
> MapRBuildVersion 5.1.0.37549.GA
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT clientname, audiencekey, spendprofileid, 
> postalcd, provincecd, provincename, postalcode_json, country_json, 
> province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER (PARTITION BY 
> spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 0 END) ASC, 
> provincecd ASC) as rn FROM `MD593.parquet` limit 3;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Failure while allocating buffer.
> Fragment 0:0
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-08-16 07:25:44,590 [284d4006-9f9d-b893-9352-4f54f9b1d52a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 284d4006-9f9d-b893-9352-4f54f9b1d52a: SELECT clientname, audiencekey, 
> spendprofileid, postalcd, provincecd, provincename, postalcode_json, 
> country_json, province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER 
> (PARTITION BY spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 
> ELSE 0 END) ASC, provincecd ASC) as rn FROM `MD593.parquet` limit 3
> ...
> 2016-08-16 07:25:46,273 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/284d4006-9f9d-b893-9352-4f54f9b1d52a_majorfragment0_minorfragment0_operator8/2
> 2016-08-16 07:25:46,283 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Failure while allocating buffer.
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
> while allocating buffer.
> at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:187)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:331)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:307)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector.getTransferPair(RepeatedMapVector.java:161)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.SimpleVectorWrapper.cloneAndTransfer(SimpleVectorWrapper.java:66)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.cloneAndTransfer(VectorContainer.java:204)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.getTransferClone(VectorContainer.java:157)
>  

[jira] [Assigned] (DRILL-5327) Hash aggregate can return empty batch which can cause schema change exception

2017-04-04 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5327:
---

Assignee: Jinfeng Ni  (was: Boaz Ben-Zvi)

> Hash aggregate can return empty batch which can cause schema change exception
> -
>
> Key: DRILL-5327
> URL: https://issues.apache.org/jira/browse/DRILL-5327
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.10.0
>Reporter: Chun Chang
>Assignee: Jinfeng Ni
>
> Hash aggregate can return empty batches which cause drill to throw schema 
> change exception (not handling this type of schema change). This is not a new 
> bug. But a recent hash function change (a theoretically correct change) may 
> have increased the chance of hitting this issue. I don't have scientific data 
> to support my claim (in fact I don't believe it's the case), but a regular 
> regression run used to pass fails now due to this bug. My concern is that 
> existing drill users out there may have queries that used to work but fail 
> now. It will be difficult to explain why the new release is better for them. 
> I put this bug as blocker so we can discuss it before releasing 1.10.
> {noformat}
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/original/text/query66.sql
> Query: 
> -- start query 66 in stream 0 using template query66.tpl 
> SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>w_city, 
>w_county, 
>w_state, 
>w_country, 
>ship_carriers, 
>year1,
>Sum(jan_sales) AS jan_sales, 
>Sum(feb_sales) AS feb_sales, 
>Sum(mar_sales) AS mar_sales, 
>Sum(apr_sales) AS apr_sales, 
>Sum(may_sales) AS may_sales, 
>Sum(jun_sales) AS jun_sales, 
>Sum(jul_sales) AS jul_sales, 
>Sum(aug_sales) AS aug_sales, 
>Sum(sep_sales) AS sep_sales, 
>Sum(oct_sales) AS oct_sales, 
>Sum(nov_sales) AS nov_sales, 
>Sum(dec_sales) AS dec_sales, 
>Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot, 
>Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot, 
>Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot, 
>Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot, 
>Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot, 
>Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot, 
>Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot, 
>Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot, 
>Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot, 
>Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot, 
>Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot, 
>Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot, 
>Sum(jan_net)   AS jan_net, 
>Sum(feb_net)   AS feb_net, 
>Sum(mar_net)   AS mar_net, 
>Sum(apr_net)   AS apr_net, 
>Sum(may_net)   AS may_net, 
>Sum(jun_net)   AS jun_net, 
>Sum(jul_net)   AS jul_net, 
>Sum(aug_net)   AS aug_net, 
>Sum(sep_net)   AS sep_net, 
>Sum(oct_net)   AS oct_net, 
>Sum(nov_net)   AS nov_net, 
>Sum(dec_net)   AS dec_net 
> FROM   (SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>w_city, 
>w_county, 
>w_state, 
>w_country, 
>'ZOUROS' 
>|| ',' 
>|| 'ZHOU' AS ship_carriers, 
>d_yearAS year1, 
>Sum(CASE 
>  WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity 
>  ELSE 0 
>END)  AS jan_sales, 
>Sum(CASE 
>  WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity 
> 

[jira] [Commented] (DRILL-4847) Window function query results in OOM Exception.

2017-04-04 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955512#comment-15955512
 ] 

Zelaine Fong commented on DRILL-4847:
-

[~khfaraaz] - There should be a stack trace, and the stack trace should tell us 
where the OOM is being thrown.  You only included one line in your comment.  
Surely, there's more information in the drillbit log?

> Window function query results in OOM Exception.
> ---
>
> Key: DRILL-4847
> URL: https://issues.apache.org/jira/browse/DRILL-4847
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
>Priority: Critical
>  Labels: window_function
> Attachments: drillbit.log
>
>
> Window function query results in OOM Exception.
> Drill version 1.8.0-SNAPSHOT git commit ID: 38ce31ca
> MapRBuildVersion 5.1.0.37549.GA
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT clientname, audiencekey, spendprofileid, 
> postalcd, provincecd, provincename, postalcode_json, country_json, 
> province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER (PARTITION BY 
> spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 0 END) ASC, 
> provincecd ASC) as rn FROM `MD593.parquet` limit 3;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Failure while allocating buffer.
> Fragment 0:0
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-08-16 07:25:44,590 [284d4006-9f9d-b893-9352-4f54f9b1d52a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 284d4006-9f9d-b893-9352-4f54f9b1d52a: SELECT clientname, audiencekey, 
> spendprofileid, postalcd, provincecd, provincename, postalcode_json, 
> country_json, province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER 
> (PARTITION BY spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 
> ELSE 0 END) ASC, provincecd ASC) as rn FROM `MD593.parquet` limit 3
> ...
> 2016-08-16 07:25:46,273 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/284d4006-9f9d-b893-9352-4f54f9b1d52a_majorfragment0_minorfragment0_operator8/2
> 2016-08-16 07:25:46,283 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Failure while allocating buffer.
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
> while allocating buffer.
> at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:187)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:331)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:307)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector.getTransferPair(RepeatedMapVector.java:161)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.SimpleVectorWrapper.cloneAndTransfer(SimpleVectorWrapper.java:66)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.cloneAndTransfer(VectorContainer.java:204)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.getTransferClone(VectorContainer.java:157)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> 

[jira] [Commented] (DRILL-4847) Window function query results in OOM Exception.

2017-04-04 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955225#comment-15955225
 ] 

Zelaine Fong commented on DRILL-4847:
-

[~khfaraaz]- any luck trying to reproduce this with the new, managed sort?

> Window function query results in OOM Exception.
> ---
>
> Key: DRILL-4847
> URL: https://issues.apache.org/jira/browse/DRILL-4847
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
>Priority: Critical
>  Labels: window_function
> Attachments: drillbit.log
>
>
> Window function query results in OOM Exception.
> Drill version 1.8.0-SNAPSHOT git commit ID: 38ce31ca
> MapRBuildVersion 5.1.0.37549.GA
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT clientname, audiencekey, spendprofileid, 
> postalcd, provincecd, provincename, postalcode_json, country_json, 
> province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER (PARTITION BY 
> spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 0 END) ASC, 
> provincecd ASC) as rn FROM `MD593.parquet` limit 3;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Failure while allocating buffer.
> Fragment 0:0
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-08-16 07:25:44,590 [284d4006-9f9d-b893-9352-4f54f9b1d52a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 284d4006-9f9d-b893-9352-4f54f9b1d52a: SELECT clientname, audiencekey, 
> spendprofileid, postalcd, provincecd, provincename, postalcode_json, 
> country_json, province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER 
> (PARTITION BY spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 
> ELSE 0 END) ASC, provincecd ASC) as rn FROM `MD593.parquet` limit 3
> ...
> 2016-08-16 07:25:46,273 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/284d4006-9f9d-b893-9352-4f54f9b1d52a_majorfragment0_minorfragment0_operator8/2
> 2016-08-16 07:25:46,283 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Failure while allocating buffer.
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
> while allocating buffer.
> at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:187)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:331)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:307)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector.getTransferPair(RepeatedMapVector.java:161)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.SimpleVectorWrapper.cloneAndTransfer(SimpleVectorWrapper.java:66)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.cloneAndTransfer(VectorContainer.java:204)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.getTransferClone(VectorContainer.java:157)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:569)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]

[jira] [Updated] (DRILL-4139) Exception while trying to prune partition. java.lang.UnsupportedOperationException: Unsupported type: BIT & Interval

2017-04-03 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4139:

Reviewer: Jinfeng Ni

[~jni] - can you review.  Thanks.

> Exception while trying to prune partition. 
> java.lang.UnsupportedOperationException: Unsupported type: BIT & Interval
> 
>
> Key: DRILL-4139
> URL: https://issues.apache.org/jira/browse/DRILL-4139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
> Environment: 4 node cluster on CentOS
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>
> Exception while trying to prune partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> is seen in drillbit.log after Functional run on 4 node cluster.
> Drill 1.3.0 sys.version => d61bb83a8
> {code}
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2015-11-27 03:12:19,810 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] WARN  
> o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
> partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:479)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:235)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:184)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5405) Add missing operator types

2017-03-31 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5405:
---

Assignee: Arina Ielchiieva  (was: Zelaine Fong)

> Add missing operator types
> --
>
> Key: DRILL-5405
> URL: https://issues.apache.org/jira/browse/DRILL-5405
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG
>
>
> Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they 
> won't be displayed on Web UI as UNKNOWN_OPERATOR.
> Example:
> before the fix -> unknown_operator.JPG
> after the fix -> maprdb_sub_scan.JPG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5405) Add missing operator types

2017-03-31 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5405:
---

Assignee: Zelaine Fong  (was: Arina Ielchiieva)

> Add missing operator types
> --
>
> Key: DRILL-5405
> URL: https://issues.apache.org/jira/browse/DRILL-5405
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Zelaine Fong
>Priority: Minor
> Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG
>
>
> Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they 
> won't be displayed on Web UI as UNKNOWN_OPERATOR.
> Example:
> before the fix -> unknown_operator.JPG
> after the fix -> maprdb_sub_scan.JPG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5405) Add missing operator types

2017-03-31 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5405:

Reviewer: Karthikeyan Manivannan

Assigned Reviewer to [~karthikm]

> Add missing operator types
> --
>
> Key: DRILL-5405
> URL: https://issues.apache.org/jira/browse/DRILL-5405
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG
>
>
> Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they 
> won't be displayed on Web UI as UNKNOWN_OPERATOR.
> Example:
> before the fix -> unknown_operator.JPG
> after the fix -> maprdb_sub_scan.JPG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-4675) Root allocator should prevent allocating more than the available direct memory

2017-03-29 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-4675:
---

Assignee: Boaz Ben-Zvi  (was: Karthikeyan Manivannan)

[~ben-zvi] - assigning to you since you're working on spill to disk for hash 
agg.  The stack trace in the issue indicates that the OOM is occuring in hash 
agg, which applies for the first query, since it has a group by.  It would be 
good to try this query with your spill to disk hash agg code to see if it 
addresses that problem.

> Root allocator should prevent allocating more than the available direct memory
> --
>
> Key: DRILL-4675
> URL: https://issues.apache.org/jira/browse/DRILL-4675
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Execution - Monitoring
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Boaz Ben-Zvi
> Attachments: error.log
>
>
> git commit # : 09b262776e965ea17a6a863801f7e1ee3e5b3d5a
> I ran the below 2 queries (each query duplicated 10 times.so total 20 
> queries) using 10 different clients on an 8 node cluster. The drillbit on one 
> of the nodes hits an OOM error. The allocator should have caught this earlier.
> Query 1:
> {code}
> select count(*) 
> from (
> select l_orderkey, l_partkey, l_suppkey 
> from lineitem_nocompression_256
> group by l_orderkey, l_partkey, l_suppkey
> ) s
> {code} 
> Query 2 :
> {code}
> select count(*) from
> dfs.concurrency.customer_nocompression_256_filtered c,
> dfs.concurrency.orders_nocompression_256 o,
> dfs.concurrency.lineitem_nocompression_256 l
> where
> c.c_custkey = o.o_custkey
> and l.l_orderkey = o.o_orderkey
> {code}
> Exception from the logs 
> {code}
> Failure allocating buffer.
> [Error Id: cd71a6a0-7f41-4fe4-8bbb-294119adfebf ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
> allocating buffer.
> at 
> io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:64)
>  ~[drill-memory-base-1.7.0-SNAPSHOT.jar:4.0.27.Final]
> at 
> org.apache.drill.exec.memory.AllocationManager.(AllocationManager.java:80)
>  ~[drill-memory-base-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:239)
>  ~[drill-memory-base-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:221) 
> ~[drill-memory-base-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:191) 
> ~[drill-memory-base-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.IntVector.allocateBytes(IntVector.java:200) 
> ~[vector-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.IntVector.allocateNew(IntVector.java:182) 
> ~[vector-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashTableGen54.allocMetadataVector(HashTableTemplate.java:757)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashTableGen54.resizeAndRehashIfNeeded(HashTableTemplate.java:722)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashTableGen54.insertEntry(HashTableTemplate.java:631)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashTableGen54.put(HashTableTemplate.java:609)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashTableGen54.put(HashTableTemplate.java:542)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen52.checkGroupAndAggrValues(HashAggTemplate.java:542)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen52.doWork(HashAggTemplate.java:300)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:133)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> 

[jira] [Updated] (DRILL-5395) Query on MapR-DB table fails with NPE due to an issue with assignment logic

2017-03-29 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5395:

Reviewer: Chunhui Shi

Assigned Reviewer to [~cshi]

> Query on MapR-DB table fails with NPE due to an issue with assignment logic
> ---
>
> Key: DRILL-5395
> URL: https://issues.apache.org/jira/browse/DRILL-5395
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary
> Fix For: 1.11.0
>
> Attachments: drillbit.log.txt
>
>
> We uncovered this issue when working on DRILL-5394. 
> The MapR-DB table in question had 5 tablets with skewed data distribution (~6 
> million rows). A partial WIP fix for DRILL-5394 caused the number of rows to 
> be reported incorrectly (~300,000). 2 minor fragments were created (due to 
> filter selectivity) for scanning the 5 tablets. And this resulted in an NPE, 
> possibly related to an issue with assignment logic, that was now exposed. 
> Representative query:
> {code}
> SELECT Convert_from(avail.customer, 'UTF8') AS ABC, 
>Convert_from(prop.customer, 'UTF8')  AS PQR 
> FROM   (SELECT Convert_from(a.row_key, 'UTF8') 
>AS customer, 
>Cast(Convert_from(a.data .` l_discount ` , 'double_be') AS 
> FLOAT) 
>AS availability 
> FROM   db.tpch_maprdb.lineitem_1 a 
> WHERE  Convert_from(a.row_key, 'UTF8') = '%004%') AS avail 
>join 
>   (SELECT Convert_from(b.row_key, 'UTF8') 
>   AS customer, 
>Cast( 
>Convert_from(b.data .` l_discount ` , 'double_be') AS FLOAT) AS 
>availability 
> FROM   db.tpch_maprdb.lineitem_1 b 
> WHERE  Convert_from(b.row_key, 'UTF8') LIKE '%003%') AS prop 
>  ON avail.customer = prop.customer; 
> {code}
> Error:
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> {code}
> Log attached. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts

2017-03-29 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5394:

Reviewer: Gautam Kumar Parai

Assigned Reviewer to [~gparai]

> Optimize query planning for MapR-DB tables by caching row counts
> 
>
> Key: DRILL-5394
> URL: https://issues.apache.org/jira/browse/DRILL-5394
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary
> Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was 
> longer than expected. With DEBUG logs, it was understood that there were 
> multiple calls being made to get MapR-DB region locations and to fetch total 
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query 
> planning. This should help reduce query planning time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4253) Some functional tests are failing because sort limit is too low

2017-03-29 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4253:

Fix Version/s: (was: 1.10.0)
   1.11.0

> Some functional tests are failing because sort limit is too low
> ---
>
> Key: DRILL-4253
> URL: https://issues.apache.org/jira/browse/DRILL-4253
> Project: Apache Drill
>  Issue Type: Test
>  Components: Tools, Build & Test
>Affects Versions: 1.5.0
> Environment: 4 nodes cluster, 32 cores each
>Reporter: Deneche A. Hakim
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The following tests are running out of memory:
> {noformat}
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q174.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q171.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q168_DRILL-2046.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q162_DRILL-1985.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q165.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q177_DRILL-2046.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q159_DRILL-2046.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/large/q157_DRILL-1985.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/large/q175_DRILL-1985.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q160_DRILL-1985.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q163_DRILL-2046.q
> {noformat}
> With errors similar to the following:
> {noformat}
> java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Failed to 
> pre-allocate memory for SV. Existing recordCount*4 = 0, incoming batch 
> recordCount*4 = 696
> {noformat}
> {noformat}
> Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
> {noformat}
> Those queries operate on wide tables and the sort limit is too low when using 
> the default value for {{planner.memory.max_query_memory_per_node}}.
> We should update those tests to set a higher limit (4GB worked well for me) 
> to {{planner.memory.max_query_memory_per_node}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5310) Memory leak in managed sort if OOM during sv2 allocation

2017-03-29 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5310:

Fix Version/s: (was: 1.10.0)
   1.11.0

> Memory leak in managed sort if OOM during sv2 allocation
> 
>
> Key: DRILL-5310
> URL: https://issues.apache.org/jira/browse/DRILL-5310
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> See the "identical1" test case in DRILL-5266. Due to misconfiguration, the 
> sort was given too little memory to make progress. An OOM error occurred when 
> allocating an SV2.
> In this scenario, the "converted" record batch is leaked.
> Normally, a converted batch is added to the list of in-memory batches, then 
> released on {{close()}}. But, in this case, the batch is only a local 
> variable, and so leaks.
> The code must release this batch in this condition.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5164) Equi-join query results in CompileException when inputs have large number of columns

2017-03-29 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5164:

Fix Version/s: (was: 1.10.0)
   1.11.0

> Equi-join query results in CompileException when inputs have large number of 
> columns
> 
>
> Key: DRILL-5164
> URL: https://issues.apache.org/jira/browse/DRILL-5164
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Fix For: 1.11.0
>
> Attachments: manyColsInJson.json
>
>
> Drill 1.9.0 
> git commit ID : 4c1b420b
> 4 node CentOS cluster
> JSON file has 4095 keys (columns)
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from `manyColsInJson.json` t1, 
> `manyColsInJson.json` t2 where t1.key2000 = t2.key2000;
> Error: SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-12-26 09:52:11,321 [279f17fd-c8f0-5d18-1124-76099f0a5cc8:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.9.0.jar:1.9.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: 
> org.apache.drill.exec.exception.SchemaChangeException: 
> org.apache.drill.exec.exception.ClassTransformationException: 
> java.util.concurrent.ExecutionException: 
> org.apache.drill.exec.exception.ClassTransformationException: Failure 
> generating transformation classes for value:
> package org.apache.drill.exec.test.generated;
> ...
> public class HashJoinProbeGen294 {
> NullableVarCharVector[] vv0;
> NullableVarCharVector vv3;
> NullableVarCharVector[] vv6;
> ...
> vv49137 .copyFromSafe((probeIndex), (outIndex), vv49134);
> vv49143 .copyFromSafe((probeIndex), (outIndex), vv49140);
> vv49149 .copyFromSafe((probeIndex), (outIndex), vv49146);
> }
> }
> 
> public void __DRILL_INIT__()
> throws SchemaChangeException
> {
> }
> }
> at 
> org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:302)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78) 
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> 

[jira] [Updated] (DRILL-5270) Improve loading of profiles listing in the WebUI

2017-03-29 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5270:

Fix Version/s: (was: 1.10.0)
   1.11.0

> Improve loading of profiles listing in the WebUI
> 
>
> Key: DRILL-5270
> URL: https://issues.apache.org/jira/browse/DRILL-5270
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> Currently, as the number of profiles increase, we reload the same list of 
> profiles from the FS.
> An ideal improvement would be to detect if there are any new profiles and 
> only reload from the disk then. Otherwise, a cached list is sufficient.
> For a directory of 280K profiles, the load time is close to 6 seconds on a 32 
> core server. With the caching, we can get it down to as much as a few 
> milliseconds.
> To render the cache as invalid, we inspect the last modified time of the 
> directory to confirm whether a reload is needed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5312) "Record batch sizer" does not include overhead for variable-sized vectors

2017-03-29 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5312:

Fix Version/s: (was: 1.10.0)
   1.11.0

> "Record batch sizer" does not include overhead for variable-sized vectors
> -
>
> Key: DRILL-5312
> URL: https://issues.apache.org/jira/browse/DRILL-5312
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The new "record batch sizer" computes the actual data size of a record given 
> a batch of vectors. For most purposes, the record width must include the 
> overhead of the offset vectors for variable-sized vectors. The initial code 
> drop included only the character data, but not the offset vector size when 
> computing row width.
> Since the "managed" external sort relies on the computed row size to 
> determine memory usage, the underestimation of row count width can cause an 
> OOM under certain low-memory conditions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5324) Provide simplified column reader/writer for use in tests

2017-03-28 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5324:

Reviewer: Karthikeyan Manivannan  (was: Gautam Kumar Parai)

Reassigned Reviewer to [~karthikm]

> Provide simplified column reader/writer for use in tests
> 
>
> Key: DRILL-5324
> URL: https://issues.apache.org/jira/browse/DRILL-5324
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> In support of DRILL-5323, we wish to provide a very easy way to work with row 
> sets. See the comment section for examples of the target API.
> Drill provides over 100 different value vectors, any of which may be required 
> to perform a specific unit test. Creating these vectors, populating them, and 
> retrieving values, is very tedious. The work is so complex that it acts to 
> discourage developers from writing such tests.
> To simplify the task, we wish to provide a simplified row set reader and 
> writer. To do that, we need to generate the corresponding column reader and 
> writer for each value vector. This ticket focuses on the column-level readers 
> and writers, and the required code generation.
> Drill already provides vector readers and writers derived from 
> {{FieldReader}}. However, these readers do not provide a uniform get/set 
> interface that is type independent on the application side. Instead, 
> application code must be aware of the type of the vector, something we seek 
> to avoid for test code.
> The reader and writer classes are designed to be used in many contexts, not 
> just for testing. As a result, their implementation makes no assumptions 
> about the broader row reader and writer, other than that a row index and the 
> required value vector are both available. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5391) CTAS: folder and file permission should be configurable

2017-03-28 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945522#comment-15945522
 ] 

Zelaine Fong commented on DRILL-5391:
-

OK, last question :).  Was there a reason you chose to use 775/664 vs using the 
default set by the underlying storage system?

> CTAS: folder and file permission should be configurable
> ---
>
> Key: DRILL-5391
> URL: https://issues.apache.org/jira/browse/DRILL-5391
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
> Environment: CentOS 7, HDP 2.4
>Reporter: Chua Tianxiang
>Priority: Minor
> Attachments: Drill-1-10.PNG, Drill-1-9.PNG
>
>
> In Drill 1.9, CREATE TABLE AS creates a folder with permissions 777, while on 
> Drill 1.10, the same commands creates a folder with permission 775. Both 
> drills are started with root user, installed on the same servers and accesses 
> the same HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5391) CTAS: folder and file permission should be configurable

2017-03-28 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945476#comment-15945476
 ] 

Zelaine Fong commented on DRILL-5391:
-

Yes, I can see how default of 775/664 makes more sense, and from what you're 
saying, that matches Hive's defaults.

But I'm still not clear as to why Drill previously was using a default of 
777/666.  You indicate it's the default dictated by the file system.  So, does 
that mean that in the case of HDFS, the default is 777/666?

> CTAS: folder and file permission should be configurable
> ---
>
> Key: DRILL-5391
> URL: https://issues.apache.org/jira/browse/DRILL-5391
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
> Environment: CentOS 7, HDP 2.4
>Reporter: Chua Tianxiang
>Priority: Minor
> Attachments: Drill-1-10.PNG, Drill-1-9.PNG
>
>
> In Drill 1.9, CREATE TABLE AS creates a folder with permissions 777, while on 
> Drill 1.10, the same commands creates a folder with permission 775. Both 
> drills are started with root user, installed on the same servers and accesses 
> the same HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5393) ALTER SESSION documentation page broken link

2017-03-28 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5393:
---

Assignee: Bridget Bevens

> ALTER SESSION documentation page broken link
> 
>
> Key: DRILL-5393
> URL: https://issues.apache.org/jira/browse/DRILL-5393
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Muhammad Gelbana
>Assignee: Bridget Bevens
>
> On [this 
> page|https://drill.apache.org/docs/modifying-query-planning-options/], there 
> is a link to the ALTER SESSION documentation page which points to this broken 
> link: https://drill.apache.org/docs/alter-session/
> I believe the correct link should be: https://drill.apache.org/docs/set/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5391) CTAS: folder and file permission should be configurable

2017-03-28 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945305#comment-15945305
 ] 

Zelaine Fong commented on DRILL-5391:
-

[~arina] - was there a specific reason you decided to make the permissions more 
restrictive?  

> CTAS: folder and file permission should be configurable
> ---
>
> Key: DRILL-5391
> URL: https://issues.apache.org/jira/browse/DRILL-5391
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
> Environment: CentOS 7, HDP 2.4
>Reporter: Chua Tianxiang
>Priority: Minor
> Attachments: Drill-1-10.PNG, Drill-1-9.PNG
>
>
> In Drill 1.9, CREATE TABLE AS creates a folder with permissions 777, while on 
> Drill 1.10, the same commands creates a folder with permission 775. Both 
> drills are started with root user, installed on the same servers and accesses 
> the same HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-4847) Window function query results in OOM Exception.

2017-03-28 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-4847:
---

Assignee: Paul Rogers

Reassigning to [~Paul.Rogers] based on [~khfaraaz]'s finding that with the new 
external sort, the testcase still fails with in OOM in external sort.

> Window function query results in OOM Exception.
> ---
>
> Key: DRILL-4847
> URL: https://issues.apache.org/jira/browse/DRILL-4847
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
>Priority: Critical
>  Labels: window_function
> Attachments: drillbit.log
>
>
> Window function query results in OOM Exception.
> Drill version 1.8.0-SNAPSHOT git commit ID: 38ce31ca
> MapRBuildVersion 5.1.0.37549.GA
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT clientname, audiencekey, spendprofileid, 
> postalcd, provincecd, provincename, postalcode_json, country_json, 
> province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER (PARTITION BY 
> spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 0 END) ASC, 
> provincecd ASC) as rn FROM `MD593.parquet` limit 3;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Failure while allocating buffer.
> Fragment 0:0
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-08-16 07:25:44,590 [284d4006-9f9d-b893-9352-4f54f9b1d52a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 284d4006-9f9d-b893-9352-4f54f9b1d52a: SELECT clientname, audiencekey, 
> spendprofileid, postalcd, provincecd, provincename, postalcode_json, 
> country_json, province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER 
> (PARTITION BY spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 
> ELSE 0 END) ASC, provincecd ASC) as rn FROM `MD593.parquet` limit 3
> ...
> 2016-08-16 07:25:46,273 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/284d4006-9f9d-b893-9352-4f54f9b1d52a_majorfragment0_minorfragment0_operator8/2
> 2016-08-16 07:25:46,283 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Failure while allocating buffer.
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
> while allocating buffer.
> at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:187)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:331)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:307)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector.getTransferPair(RepeatedMapVector.java:161)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.SimpleVectorWrapper.cloneAndTransfer(SimpleVectorWrapper.java:66)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.cloneAndTransfer(VectorContainer.java:204)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.getTransferClone(VectorContainer.java:157)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> 

[jira] [Assigned] (DRILL-5389) select 2 int96 using convert_from(col, 'TIMESTAMP_IMPALA') function fails

2017-03-27 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5389:
---

Assignee: Vitalii Diravka

> select 2 int96 using convert_from(col, 'TIMESTAMP_IMPALA') function fails
> -
>
> Key: DRILL-5389
> URL: https://issues.apache.org/jira/browse/DRILL-5389
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Krystal
>Assignee: Vitalii Diravka
>
> I have a table containing 2 int96 time stamp columns. If I select one column 
> at a time, it works.
> select convert_from(create_timestamp1, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
> ++
> | EXPR$0 |
> ++
> | 2017-04-14 02:27:55.0  |
> ++
> select convert_from(create_timestamp2, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
> ++
> | EXPR$0 |
> ++
> | 2017-05-30 19:30:11.0  |
> ++
> However, if I include both columns on the same select, it fails:
> select convert_from(create_timestamp1, 'TIMESTAMP_IMPALA'), 
> convert_from(create_timestamp2, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
> Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 0
> This is reproducible in drill-1.9 also.
> In drill-1.10, setting store.parquet.reader.int96_as_timestamp`=true, the 
> same query works fine.
> select create_timestamp1,create_timestamp2 from 
> dfs.`/user/hive/warehouse/hive1_parquet` where voter_id=3;
> +++
> |   create_timestamp1|   create_timestamp2|
> +++
> | 2017-04-14 02:27:55.0  | 2017-05-30 19:30:11.0  |
> +++



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5378) Put more information into SchemaChangeException when HashJoin hit SchemaChangeException

2017-03-27 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5378:

Reviewer: Aman Sinha

> Put more information into SchemaChangeException when HashJoin hit 
> SchemaChangeException
> ---
>
> Key: DRILL-5378
> URL: https://issues.apache.org/jira/browse/DRILL-5378
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Minor
>
> HashJoin currently does not allow schema change in either build side or probe 
> side. When HashJoin hit SchemaChangeException in the middle of execution, 
> Drill reports a brief error message about SchemaChangeException, without 
> providing any information what schemas are in the incoming batches. That 
> makes hard to analyze the error, and understand what's going on. 
> It probably makes sense to put the two differing schemas in the error 
> message, so that user could get better idea about the schema change. 
> Before Drill can provide support for schema change in HashJoin, the detailed 
> error message would help user debug error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4847) Window function query results in OOM Exception.

2017-03-27 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944225#comment-15944225
 ] 

Zelaine Fong commented on DRILL-4847:
-

OOM is coming from external sort, not the window function.

[~khfaraaz] - can you try this with the new external sort to see if this is 
still an issue.

To enable the new sort, run

ALTER SESSION SET `exec.sort.disable_managed` = false;

> Window function query results in OOM Exception.
> ---
>
> Key: DRILL-4847
> URL: https://issues.apache.org/jira/browse/DRILL-4847
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Priority: Critical
>  Labels: window_function
> Attachments: drillbit.log
>
>
> Window function query results in OOM Exception.
> Drill version 1.8.0-SNAPSHOT git commit ID: 38ce31ca
> MapRBuildVersion 5.1.0.37549.GA
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT clientname, audiencekey, spendprofileid, 
> postalcd, provincecd, provincename, postalcode_json, country_json, 
> province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER (PARTITION BY 
> spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 0 END) ASC, 
> provincecd ASC) as rn FROM `MD593.parquet` limit 3;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Failure while allocating buffer.
> Fragment 0:0
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-08-16 07:25:44,590 [284d4006-9f9d-b893-9352-4f54f9b1d52a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 284d4006-9f9d-b893-9352-4f54f9b1d52a: SELECT clientname, audiencekey, 
> spendprofileid, postalcd, provincecd, provincename, postalcode_json, 
> country_json, province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER 
> (PARTITION BY spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 
> ELSE 0 END) ASC, provincecd ASC) as rn FROM `MD593.parquet` limit 3
> ...
> 2016-08-16 07:25:46,273 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/284d4006-9f9d-b893-9352-4f54f9b1d52a_majorfragment0_minorfragment0_operator8/2
> 2016-08-16 07:25:46,283 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Failure while allocating buffer.
> [Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
> while allocating buffer.
> at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:187)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:331)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:307)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector.getTransferPair(RepeatedMapVector.java:161)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.SimpleVectorWrapper.cloneAndTransfer(SimpleVectorWrapper.java:66)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.cloneAndTransfer(VectorContainer.java:204)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.getTransferClone(VectorContainer.java:157)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> 

[jira] [Updated] (DRILL-5286) When rel and target candidate set is the same, planner should not need to do convert for the relNode since it must have been done

2017-03-26 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5286:

Reviewer: Paul Rogers

> When rel and target candidate set is the same, planner should not need to do 
> convert for the relNode since it must have been done
> -
>
> Key: DRILL-5286
> URL: https://issues.apache.org/jira/browse/DRILL-5286
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5351) Excessive bounds checking in the Parquet reader

2017-03-26 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5351:

Labels: ready-to-commit  (was: )

> Excessive bounds checking in the Parquet reader 
> 
>
> Key: DRILL-5351
> URL: https://issues.apache.org/jira/browse/DRILL-5351
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>  Labels: ready-to-commit
>
> In profiling the Parquet reader, the variable length decoding appears to be a 
> major bottleneck making the reader CPU bound rather than disk bound.
> A yourkit profile indicates the following methods being severe bottlenecks -
> VarLenBinaryReader.determineSizeSerial(long)
>   NullableVarBinaryVector$Mutator.setSafe(int, int, int, int, DrillBuf)
>   DrillBuf.chk(int, int)
>   NullableVarBinaryVector$Mutator.fillEmpties()
> The problem is that each of these methods does some form of bounds checking 
> and eventually of course, the actual write to the ByteBuf is also bounds 
> checked.
> DrillBuf.chk can be disabled by a configuration setting. Disabling this does 
> improve performance of TPCH queries. In addition, all regression, unit, and 
> TPCH-SF100 tests pass. 
> I would recommend we allow users to turn this check off if there are 
> performance critical queries.
> Removing the bounds checking at every level is going to be a fair amount of 
> work. In the meantime, it appears that a few simple changes to variable 
> length vectors improves query performance by about 10% across the board. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-03-24 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5089:

Reviewer: Padma Penumarthy

Assigned Reviewer to [~ppenumarthy]

> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4971) Query encounters system error, when there aren't eval subexpressions of any function in boolean and/or expressions

2017-03-24 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4971:

Labels: ready-to-commit  (was: )

> Query encounters system error, when there aren't eval subexpressions of any 
> function in boolean and/or expressions
> --
>
> Key: DRILL-4971
> URL: https://issues.apache.org/jira/browse/DRILL-4971
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>  Labels: ready-to-commit
> Attachments: low_table, medium_table
>
>
> This query returns an error.  The stack trace suggests it might be a schema 
> change issue, but there is no schema change in this table.  Many other 
> queries are succeeding.
> select count(\*) from test where ((int_id > 3060 and int_id < 6002) or 
> (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) 
> or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002);
> Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break 
> AndOP3" is not enclosed by a breakable statement with label "AndOP3"
> [Error Id: 254d093b-79a1-4425-802c-ade08db293e4 on qa-node211:31010]^M
> ^M
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> attempting to load generated class^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M
> There are two partitions to the test table.  One covers the range 3061 - 6001 
> and the other covers the range 9026 - 11975.
> This second query returns a different, but possibly related, error.  
> select count(\*) from orders_parts where (((int_id > -3025 and int_id < -4) 
> or (int_id > -5 and int_id < 3061) or (int_id > 3060 and int_id < 6002)) and 
> (int_id > -5 and int_id < 3061)) and (((int_id > -5 and int_id < 3061) or 
> (int_id > 9025 and int_id < 11976)) and (int_id > -5 and int_id < 3061))^M
> Failed with exception^M
> java.sql.SQLException: SYSTEM ERROR: CompileException: Line 447, Column 30: 
> Statement "break AndOP6" is not enclosed by a breakable statement with label 
> "AndOP6"^M
> ^M
> Fragment 0:0^M
> ^M
> [Error Id: ac09187e-d3a2-41a7-a659-b287aca6039c on qa-node209:31010]^M
> ^M
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> attempting to load generated class^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4274) ExternalSort doesn't always handle low memory condition well, failing execution instead of spilling in some cases

2017-03-24 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940865#comment-15940865
 ] 

Zelaine Fong commented on DRILL-4274:
-

[~Paul.Rogers], [~rkins] - do you guys know if this still is an issue with the 
new external sort?

> ExternalSort doesn't always handle low memory condition well, failing 
> execution instead of spilling in some cases
> -
>
> Key: DRILL-4274
> URL: https://issues.apache.org/jira/browse/DRILL-4274
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jacques Nadeau
> Fix For: Future
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release

2017-03-24 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940846#comment-15940846
 ] 

Zelaine Fong commented on DRILL-4347:
-

The fix for DRILL-4678 should address this.

> Planning time for query64 from TPCDS test suite has increased 10 times 
> compared to 1.4 release
> --
>
> Key: DRILL-4347
> URL: https://issues.apache.org/jira/browse/DRILL-4347
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Aman Sinha
> Fix For: Future
>
> Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, 
> 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt
>
>
> mapr-drill-1.5.0.201602012001-1.noarch.rpm
> {code}
> 0: jdbc:drill:schema=dfs> WITH cs_ui
> . . . . . . . . . . . . >  AS (SELECT cs_item_sk,
> . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale,
> . . . . . . . . . . . . > Sum(cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit) AS refund
> . . . . . . . . . . . . >  FROM   catalog_sales,
> . . . . . . . . . . . . > catalog_returns
> . . . . . . . . . . . . >  WHERE  cs_item_sk = cr_item_sk
> . . . . . . . . . . . . > AND cs_order_number = 
> cr_order_number
> . . . . . . . . . . . . >  GROUP  BY cs_item_sk
> . . . . . . . . . . . . >  HAVING Sum(cs_ext_list_price) > 2 * Sum(
> . . . . . . . . . . . . > cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit)),
> . . . . . . . . . . . . >  cross_sales
> . . . . . . . . . . . . >  AS (SELECT i_product_name product_name,
> . . . . . . . . . . . . > i_item_sk  item_sk,
> . . . . . . . . . . . . > s_store_name   store_name,
> . . . . . . . . . . . . > s_zip  store_zip,
> . . . . . . . . . . . . > ad1.ca_street_number   
> b_street_number,
> . . . . . . . . . . . . > ad1.ca_street_name 
> b_streen_name,
> . . . . . . . . . . . . > ad1.ca_cityb_city,
> . . . . . . . . . . . . > ad1.ca_zip b_zip,
> . . . . . . . . . . . . > ad2.ca_street_number   
> c_street_number,
> . . . . . . . . . . . . > ad2.ca_street_name 
> c_street_name,
> . . . . . . . . . . . . > ad2.ca_cityc_city,
> . . . . . . . . . . . . > ad2.ca_zip c_zip,
> . . . . . . . . . . . . > d1.d_year  AS syear,
> . . . . . . . . . . . . > d2.d_year  AS fsyear,
> . . . . . . . . . . . . > d3.d_year  s2year,
> . . . . . . . . . . . . > Count(*)   cnt,
> . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1,
> . . . . . . . . . . . . > Sum(ss_list_price) s2,
> . . . . . . . . . . . . > Sum(ss_coupon_amt) s3
> . . . . . . . . . . . . >  FROM   store_sales,
> . . . . . . . . . . . . > store_returns,
> . . . . . . . . . . . . > cs_ui,
> . . . . . . . . . . . . > date_dim d1,
> . . . . . . . . . . . . > date_dim d2,
> . . . . . . . . . . . . > date_dim d3,
> . . . . . . . . . . . . > store,
> . . . . . . . . . . . . > customer,
> . . . . . . . . . . . . > customer_demographics cd1,
> . . . . . . . . . . . . > customer_demographics cd2,
> . . . . . . . . . . . . > promotion,
> . . . . . . . . . . . . > household_demographics hd1,
> . . . . . . . . . . . . > household_demographics hd2,
> . . . . . . . . . . . . > customer_address ad1,
> . . . . . . . . . . . . > customer_address ad2,
> . . . . . . . . . . . . > income_band ib1,
> . . . . . . . . . . . . > income_band ib2,
> . . . . . . . . . . . . > item
> . . . . . . . . . . . . >  WHERE  ss_store_sk = s_store_sk
> . . . . . . . . . . . . > AND ss_sold_date_sk = d1.d_date_sk
> . . . . . . . . . . . . > AND ss_customer_sk = c_customer_sk
> . . . . . . . . . . . . > AND ss_cdemo_sk = cd1.cd_demo_sk
> . . . . . . . . . . . . > AND ss_hdemo_sk = hd1.hd_demo_sk
> . . . . . . . . . . . . > AND ss_addr_sk = 

[jira] [Resolved] (DRILL-4647) C++ client is not propagating a connection failed error when a drillbit goes down

2017-03-24 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong resolved DRILL-4647.
-
   Resolution: Fixed
Fix Version/s: 1.8.0

> C++ client is not propagating a connection failed error when a drillbit goes 
> down
> -
>
> Key: DRILL-4647
> URL: https://issues.apache.org/jira/browse/DRILL-4647
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
> Fix For: 1.8.0
>
>
> When a drillbit goes down, there are two conditions under which the client is 
> not propagating the error back to the application -
> 1) The application is in a submitQuery call: the ODBC driver is expecting 
> that the error be reported thru the query results listener which hasn't been 
> registered at the point the error is encountered.
> 2) A submitQuery call succeeded but never reached the drillbit because it was 
> shutdown. In this case the application has a handle to a query and is 
> listening for results which will never arrive. The heartbeat mechanism 
> detects the failure, but is not propagating the error to the query results 
> listener.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5377) Drill returns weird characters when parquet date auto-correction is turned off

2017-03-23 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939352#comment-15939352
 ] 

Zelaine Fong commented on DRILL-5377:
-

[~rkins] - are the dates correct if auto-correction is enabled? 

> Drill returns weird characters when parquet date auto-correction is turned off
> --
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4301) OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.

2017-03-23 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939222#comment-15939222
 ] 

Zelaine Fong commented on DRILL-4301:
-

[~Paul.Rogers] - as I noted in my comment, I believe the partition pruning 
error is DRILL-4139.  There is a pull request for that Jira but it needs 
further testing.  The issue has been assigned.

> OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to 
> spill.
> ---
>
> Key: DRILL-4301
> URL: https://issues.apache.org/jira/browse/DRILL-4301
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> Query below in Functional tests, fails due to OOM 
> {code}
> select * from dfs.`/drill/testdata/metadata_caching/fewtypes_boolpartition` 
> where bool_col = true;
> {code}
> Drill version : drill-1.5.0
> JAVA_VERSION=1.8.0
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOT2f0e3f27e630d5ac15cdaef808564e01708c3c55
> DRILL-4190 Don't hold on to batches from left side of merge join.   
> 20.01.2016 @ 22:30:26 UTC   Unknown 20.01.2016 @ 23:48:33 UTC
> framework/framework/resources/Functional/metadata_caching/data/bool_partition1.q
>  (connection: 808078113)
> [#1378] Query failed: 
> oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: 
> One or more nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
> batchGroups.size 0
> spilledBatchGroups.size 0
> allocated memory 48326272
> allocator limit 46684427
> Fragment 0:0
> [Error Id: 97d58ea3-8aff-48cf-a25e-32363b8e0ecd on drill-demod2:31010]
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> 

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-23 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938592#comment-15938592
 ] 

Zelaine Fong commented on DRILL-5375:
-

[~arina] - thanks for your explanation on right/full joins.  Makes sense.

> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-23 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5375:

Reviewer: Aman Sinha

Assigned Reviewer to [~amansinha100]

> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (DRILL-5164) Equi-join query results in CompileException when inputs have large number of columns

2017-03-23 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reopened DRILL-5164:
-
  Assignee: Volodymyr Vysotskyi  (was: Serhii Harnyk)

Reopened, based on [~khfaraaz]'s findings that the repro query fails, albeit 
with a different error.

> Equi-join query results in CompileException when inputs have large number of 
> columns
> 
>
> Key: DRILL-5164
> URL: https://issues.apache.org/jira/browse/DRILL-5164
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Fix For: 1.10.0
>
> Attachments: manyColsInJson.json
>
>
> Drill 1.9.0 
> git commit ID : 4c1b420b
> 4 node CentOS cluster
> JSON file has 4095 keys (columns)
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from `manyColsInJson.json` t1, 
> `manyColsInJson.json` t2 where t1.key2000 = t2.key2000;
> Error: SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-12-26 09:52:11,321 [279f17fd-c8f0-5d18-1124-76099f0a5cc8:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.9.0.jar:1.9.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: 
> org.apache.drill.exec.exception.SchemaChangeException: 
> org.apache.drill.exec.exception.ClassTransformationException: 
> java.util.concurrent.ExecutionException: 
> org.apache.drill.exec.exception.ClassTransformationException: Failure 
> generating transformation classes for value:
> package org.apache.drill.exec.test.generated;
> ...
> public class HashJoinProbeGen294 {
> NullableVarCharVector[] vv0;
> NullableVarCharVector vv3;
> NullableVarCharVector[] vv6;
> ...
> vv49137 .copyFromSafe((probeIndex), (outIndex), vv49134);
> vv49143 .copyFromSafe((probeIndex), (outIndex), vv49140);
> vv49149 .copyFromSafe((probeIndex), (outIndex), vv49146);
> }
> }
> 
> public void __DRILL_INIT__()
> throws SchemaChangeException
> {
> }
> }
> at 
> org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:302)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> 

[jira] [Commented] (DRILL-4938) Report UserException when constant expression reduction fails

2017-03-23 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938565#comment-15938565
 ] 

Zelaine Fong commented on DRILL-4938:
-

[~khfaraaz] - the gist of this fix is to report a better error, not to 
eliminate the error.  I see that the query now returns a "PLAN ERROR" instead 
of a "SYSTEM ERROR".  

> Report UserException when constant expression reduction fails
> -
>
> Key: DRILL-4938
> URL: https://issues.apache.org/jira/browse/DRILL-4938
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Serhii Harnyk
>Priority: Minor
> Fix For: 1.10.0
>
>
> We need a better error message instead of DrillRuntimeException
> Drill 1.9.0 git commit ID : 4edabe7a
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select (res1 = 2016/09/22) res2
> . . . . . . . . . . . . . . > from
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > select (case when (false) then null else 
> cast('2016/09/22' as date) end) res1
> . . . . . . . . . . . . . . > from (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator [CASE(false, =(null, /(/(2016, 
> 9), 22)), =(CAST('2016/09/22'):DATE NOT NULL, /(/(2016, 9), 22)))].  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-4139) Exception while trying to prune partition. java.lang.UnsupportedOperationException: Unsupported type: BIT

2017-03-23 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-4139:
---

Assignee: Volodymyr Vysotskyi  (was: Aman Sinha)

Volodymyr -- this might be a good issue for you to start with.  There is 
already a pull request, but it looks like it's missing a unit test.  Can you 
add the unit test and then do the necessary testing.  Thanks.

> Exception while trying to prune partition. 
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> -
>
> Key: DRILL-4139
> URL: https://issues.apache.org/jira/browse/DRILL-4139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
> Environment: 4 node cluster on CentOS
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>
> Exception while trying to prune partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> is seen in drillbit.log after Functional run on 4 node cluster.
> Drill 1.3.0 sys.version => d61bb83a8
> {code}
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2015-11-27 03:12:19,810 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] WARN  
> o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
> partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:479)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:235)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:184)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4301) OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.

2017-03-23 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938552#comment-15938552
 ] 

Zelaine Fong commented on DRILL-4301:
-

[~Paul.Rogers] - did you not see the partition pruning error when you tested 
this?

[~khfaraaz] - the partition pruning error you're seeing looks like DRILL-4139, 
which you previously reported.  It looks like there is a pull request for that 
issue, but it didn't get completed.  I will reassign that issue.

> OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to 
> spill.
> ---
>
> Key: DRILL-4301
> URL: https://issues.apache.org/jira/browse/DRILL-4301
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> Query below in Functional tests, fails due to OOM 
> {code}
> select * from dfs.`/drill/testdata/metadata_caching/fewtypes_boolpartition` 
> where bool_col = true;
> {code}
> Drill version : drill-1.5.0
> JAVA_VERSION=1.8.0
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOT2f0e3f27e630d5ac15cdaef808564e01708c3c55
> DRILL-4190 Don't hold on to batches from left side of merge join.   
> 20.01.2016 @ 22:30:26 UTC   Unknown 20.01.2016 @ 23:48:33 UTC
> framework/framework/resources/Functional/metadata_caching/data/bool_partition1.q
>  (connection: 808078113)
> [#1378] Query failed: 
> oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: 
> One or more nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
> batchGroups.size 0
> spilledBatchGroups.size 0
> allocated memory 48326272
> allocator limit 46684427
> Fragment 0:0
> [Error Id: 97d58ea3-8aff-48cf-a25e-32363b8e0ecd on drill-demod2:31010]
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> 

[jira] [Updated] (DRILL-4678) Tune metadata by generating a dispatcher at runtime

2017-03-22 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4678:

Reviewer: Jinfeng Ni

Assigned Reviewer to [~jni]

> Tune metadata by generating a dispatcher at runtime
> ---
>
> Key: DRILL-4678
> URL: https://issues.apache.org/jira/browse/DRILL-4678
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Serhii Harnyk
>Priority: Critical
> Attachments: hung_Date_Query.log
>
>
> Below query hangs
> {noformat}
> 2016-05-16 10:33:57,506 [28c65de9-9f67-dadb-5e4e-e1a12f8dda49:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 28c65de9-9f67-dadb-5e4e-e1a12f8dda49: SELECT DISTINCT dt FROM (
> VALUES(CAST('1964-03-07' AS DATE)),
>   (CAST('2002-03-04' AS DATE)),
>   (CAST('1966-09-04' AS DATE)),
>   (CAST('1993-08-18' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1959-10-23' AS DATE)),
>   (CAST('1992-01-14' AS DATE)),
>   (CAST('1994-07-24' AS DATE)),
>   (CAST('1979-11-25' AS DATE)),
>   (CAST('1945-01-14' AS DATE)),
>   (CAST('1982-07-25' AS DATE)),
>   (CAST('1966-09-06' AS DATE)),
>   (CAST('1989-05-01' AS DATE)),
>   (CAST('1996-03-08' AS DATE)),
>   (CAST('1998-08-19' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
> (CAST('1999-07-20' AS DATE)),
> (CAST('1962-07-03' AS DATE)),
>   (CAST('2011-08-17' AS DATE)),
>   (CAST('2011-05-16' AS DATE)),
>   (CAST('1946-05-08' AS DATE)),
>   (CAST('1994-02-13' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1958-02-06' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('1998-03-26' AS DATE)),
>   (CAST('1996-11-04' AS DATE)),
>   (CAST('1953-09-25' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('1980-07-05' AS DATE)),
>   (CAST('1982-06-15' AS DATE)),
>   (CAST('1951-05-16' AS DATE)))
> tbl(dt)
> {noformat}
> Details from Web UI Profile tab, please note that the query is still in 
> STARTING state
> {noformat}
> Running Queries
> Time  UserQuery   State   Foreman
> 05/16/2016 10:33:57   
> mapr
>  SELECT DISTINCT dt FROM ( VALUES(CAST('1964-03-07' AS DATE)), 
> (CAST('2002-03-04' AS DATE)), (CAST('1966-09-04' AS DATE)), (CAST('199
> STARTING
> centos-01.qa.lab
> {noformat}
> There is no other useful information in drillbit.log. jstack output is 
> attached here for your reference.
> The same query works fine on Postgres 9.3



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"

2017-03-22 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4971:

Reviewer: Jinfeng Ni

Assigned Reviewer to [~jni]

> query encounters system error: Statement "break AndOP3" is not enclosed by a 
> breakable statement with label "AndOP3"
> 
>
> Key: DRILL-4971
> URL: https://issues.apache.org/jira/browse/DRILL-4971
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: low_table, medium_table
>
>
> This query returns an error.  The stack trace suggests it might be a schema 
> change issue, but there is no schema change in this table.  Many other 
> queries are succeeding.
> select count(\*) from test where ((int_id > 3060 and int_id < 6002) or 
> (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) 
> or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002);
> Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break 
> AndOP3" is not enclosed by a breakable statement with label "AndOP3"
> [Error Id: 254d093b-79a1-4425-802c-ade08db293e4 on qa-node211:31010]^M
> ^M
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> attempting to load generated class^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M
> There are two partitions to the test table.  One covers the range 3061 - 6001 
> and the other covers the range 9026 - 11975.
> This second query returns a different, but possibly related, error.  
> select count(\*) from orders_parts where (((int_id > -3025 and int_id < -4) 
> or (int_id > -5 and int_id < 3061) or (int_id > 3060 and int_id < 6002)) and 
> (int_id > -5 and int_id < 3061)) and (((int_id > -5 and int_id < 3061) or 
> (int_id > 9025 and int_id < 11976)) and (int_id > -5 and int_id < 3061))^M
> Failed with exception^M
> java.sql.SQLException: SYSTEM ERROR: CompileException: Line 447, Column 30: 
> Statement "break AndOP6" is not enclosed by a breakable statement with label 
> "AndOP6"^M
> ^M
> Fragment 0:0^M
> ^M
> [Error Id: ac09187e-d3a2-41a7-a659-b287aca6039c on qa-node209:31010]^M
> ^M
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> attempting to load generated class^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-22 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936675#comment-15936675
 ] 

Zelaine Fong commented on DRILL-5375:
-

[~arina] - does this mean FULL OUTER joins won't be supported as part of your 
changes?  If so, any reason why not?

Also, I don't quite understand why you have to enable a new option to allow 
right outer joins to be flipped to left outer joins.   Shouldn't we always be 
doing join optimization?  What is the default for that new option?  Enabled?

> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5367) Join query returns wrong results

2017-03-20 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933292#comment-15933292
 ] 

Zelaine Fong commented on DRILL-5367:
-

Looks like a problem with the USING clause in the join.  If I replace it with 
an explicit equality condition, the query returns 239 rows.

> Join query returns wrong results
> 
>
> Key: DRILL-5367
> URL: https://issues.apache.org/jira/browse/DRILL-5367
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.10.0
> Environment: 3 node cluster
>Reporter: Khurram Faraaz
> Attachments: using_f1.parquet, using_f2.parquet
>
>
> Join query returns wrong results
> Drill 1.10.0 does not return any results.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM using_f1 JOIN (SELECT * FROM 
> using_f2) foo USING(col_prime);
> +-+++-+-+---+--+-+-+--+--++
> | col_dt  | col_state  | col_prime  | col_varstr  | col_id  | col_name  | 
> col_dt0  | col_state0  | col_prime0  | col_varstr0  | col_id0  | col_name0  |
> +-+++-+-+---+--+-+-+--+--++
> +-+++-+-+---+--+-+-+--+--++
> No rows selected (0.314 seconds)
> {noformat}
> {noformat}
> Explain plan for above failing query
> 0: jdbc:drill:schema=dfs.tmp> explain plan for SELECT * FROM using_f1 JOIN 
> (SELECT * FROM using_f2) foo USING(col_prime);
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  ProjectAllowDup(*=[$0], *0=[$1])
> 00-02Project(T49¦¦*=[$0], T48¦¦*=[$2])
> 00-03  Project(T49¦¦*=[$1], col_prime=[$2], T48¦¦*=[$0])
> 00-04HashJoin(condition=[=($2, $0)], joinType=[inner])
> 00-06  Project(T48¦¦*=[$0])
> 00-08Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/using_f2]], 
> selectionRoot=maprfs:/tmp/using_f2, numFiles=1, usedMetadataFile=false, 
> columns=[`*`]]])
> 00-05  Project(T49¦¦*=[$0], col_prime=[$1])
> 00-07Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/using_f1]], 
> selectionRoot=maprfs:/tmp/using_f1, numFiles=1, usedMetadataFile=false, 
> columns=[`*`]]])
> {noformat}
> Whereas Postgres 9.3 returns expected results for the same data.
> {noformat}
> postgres=# SELECT * FROM using_f1 JOIN (SELECT * FROM using_f2) foo 
> USING(col_prime);
>  col_prime |   col_dt   | col_state |  col_varstr
> | col_id | col_name  |   col_dt   | col_state |
>   col_varstr| col_id |   col_name
> ---++---+---
> ++---++---+-
> ++--
>103 | 2014-12-24 | TX| 
> LUW2QzWGdJfnxHrqm3vwyndzRBFwH8l5xVDaM3hTiZAanp
> j   |  19462 | Julie Lennox  | 1990-01-11 | WV| 
> KKzEOgle6E5h
> NANduNAAIp9DQnGLGxO |  54217 | Derek Wilson
>103 | 1985-07-18 | CA| 
> aYQ2uLpPxebGGRvcX0fahrAOO4yhkDRvMPES6PuYsIfwkU
> Mrcq6NSdt0j |  48987 | Lillian Lupo  | 1990-01-11 | WV| 
> KKzEOgle6E5h
> NANduNAAIp9DQnGLGxO |  54217 | Derek Wilson
>103 | 1988-02-27 | SC| 
> OcVKheHMyeKLgcvamrJHUxKyCGGJGci3Y9ht2LI9T5Ek1n
> wckB|  52840 | Martha Rose   | 1990-01-11 | WV| 
> KKzEOgle6E5h
> NANduNAAIp9DQnGLGxO |  54217 | Derek Wilson
>211 | 1989-12-06 | SD| HHlmvV4
> |   1131 | Kenneth Hayes | 1989-05-31 | MT| 
> yhHfCGaCqnAr
> XUCD4jRoZQ4fj6IQIKZHUGLlIsSr1L7voCE3lEmj3DOSFqJ0Kq  |  49191 | Joan Stein
> 43 | 2006-01-24 | NV| 
> EJAN2JjRqoQWgp7rHLT1yPMBR50g1Kil3klu1vPritFKB2
> 5EjmL1tLXleagAP |  30179 | William Strassel  | 2006-03-02 | MI| 
> W9G0nWo8QNtH
> r9YxOscigPbtXEtNPZ  |  44849 | Catherine 
> Turner
>193 | 1990-01-14 | NV| 9nd3po1bnyasqINVA
> |  47775 | James Walters
> ...
> 1990-01-14 | NV| 9nd3po1bnyasqINVA
> |  47775 | James Walters | 1980-04-22 | ID| 
> jR8jr1lqDprU
> FPhAX4xZnulndYNd3   |   5876 | Rosie Johnson
>  5 | 2004-01-27 | KS| 

[jira] [Commented] (DRILL-5367) Join query returns wrong results

2017-03-20 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932868#comment-15932868
 ] 

Zelaine Fong commented on DRILL-5367:
-

[~khfaraaz] - can you attach the data files you used.

> Join query returns wrong results
> 
>
> Key: DRILL-5367
> URL: https://issues.apache.org/jira/browse/DRILL-5367
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.10.0
> Environment: 3 node cluster
>Reporter: Khurram Faraaz
>
> Join query returns wrong results
> Drill 1.10.0 does not return any results.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM using_f1 JOIN (SELECT * FROM 
> using_f2) foo USING(col_prime);
> +-+++-+-+---+--+-+-+--+--++
> | col_dt  | col_state  | col_prime  | col_varstr  | col_id  | col_name  | 
> col_dt0  | col_state0  | col_prime0  | col_varstr0  | col_id0  | col_name0  |
> +-+++-+-+---+--+-+-+--+--++
> +-+++-+-+---+--+-+-+--+--++
> No rows selected (0.314 seconds)
> {noformat}
> {noformat}
> Explain plan for above failing query
> 0: jdbc:drill:schema=dfs.tmp> explain plan for SELECT * FROM using_f1 JOIN 
> (SELECT * FROM using_f2) foo USING(col_prime);
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  ProjectAllowDup(*=[$0], *0=[$1])
> 00-02Project(T49¦¦*=[$0], T48¦¦*=[$2])
> 00-03  Project(T49¦¦*=[$1], col_prime=[$2], T48¦¦*=[$0])
> 00-04HashJoin(condition=[=($2, $0)], joinType=[inner])
> 00-06  Project(T48¦¦*=[$0])
> 00-08Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/using_f2]], 
> selectionRoot=maprfs:/tmp/using_f2, numFiles=1, usedMetadataFile=false, 
> columns=[`*`]]])
> 00-05  Project(T49¦¦*=[$0], col_prime=[$1])
> 00-07Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/using_f1]], 
> selectionRoot=maprfs:/tmp/using_f1, numFiles=1, usedMetadataFile=false, 
> columns=[`*`]]])
> {noformat}
> Whereas Postgres 9.3 returns expected results for the same data.
> {noformat}
> postgres=# SELECT * FROM using_f1 JOIN (SELECT * FROM using_f2) foo 
> USING(col_prime);
>  col_prime |   col_dt   | col_state |  col_varstr
> | col_id | col_name  |   col_dt   | col_state |
>   col_varstr| col_id |   col_name
> ---++---+---
> ++---++---+-
> ++--
>103 | 2014-12-24 | TX| 
> LUW2QzWGdJfnxHrqm3vwyndzRBFwH8l5xVDaM3hTiZAanp
> j   |  19462 | Julie Lennox  | 1990-01-11 | WV| 
> KKzEOgle6E5h
> NANduNAAIp9DQnGLGxO |  54217 | Derek Wilson
>103 | 1985-07-18 | CA| 
> aYQ2uLpPxebGGRvcX0fahrAOO4yhkDRvMPES6PuYsIfwkU
> Mrcq6NSdt0j |  48987 | Lillian Lupo  | 1990-01-11 | WV| 
> KKzEOgle6E5h
> NANduNAAIp9DQnGLGxO |  54217 | Derek Wilson
>103 | 1988-02-27 | SC| 
> OcVKheHMyeKLgcvamrJHUxKyCGGJGci3Y9ht2LI9T5Ek1n
> wckB|  52840 | Martha Rose   | 1990-01-11 | WV| 
> KKzEOgle6E5h
> NANduNAAIp9DQnGLGxO |  54217 | Derek Wilson
>211 | 1989-12-06 | SD| HHlmvV4
> |   1131 | Kenneth Hayes | 1989-05-31 | MT| 
> yhHfCGaCqnAr
> XUCD4jRoZQ4fj6IQIKZHUGLlIsSr1L7voCE3lEmj3DOSFqJ0Kq  |  49191 | Joan Stein
> 43 | 2006-01-24 | NV| 
> EJAN2JjRqoQWgp7rHLT1yPMBR50g1Kil3klu1vPritFKB2
> 5EjmL1tLXleagAP |  30179 | William Strassel  | 2006-03-02 | MI| 
> W9G0nWo8QNtH
> r9YxOscigPbtXEtNPZ  |  44849 | Catherine 
> Turner
>193 | 1990-01-14 | NV| 9nd3po1bnyasqINVA
> |  47775 | James Walters
> ...
> 1990-01-14 | NV| 9nd3po1bnyasqINVA
> |  47775 | James Walters | 1980-04-22 | ID| 
> jR8jr1lqDprU
> FPhAX4xZnulndYNd3   |   5876 | Rosie Johnson
>  5 | 2004-01-27 | KS| 0A8Gwqm66k6wQ1KzcUdSQKZU3AZtPImxb8
> |  57787 | Dean Salazar  | 1997-09-13 | SC| 
> uq35Sqf1GfPt
> IV1mE2CzwxKaX  

[jira] [Updated] (DRILL-5356) Refactor Parquet Record Reader

2017-03-20 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5356:

Reviewer: Padma Penumarthy

Assigned Reviewer to [~ppenumarthy]

> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-density record batches.
> Rather than continue to decypher the complex code multiple times, this ticket 
> requests to refactor the code to make it functionally identical, but 
> structurally cleaner. The result will be faster time to value when working 
> with this code.
> This is a lower-priority change and will be coordinated with others working 
> on this code base. This ticket is only for the record reader class itself; it 
> does not include the various readers and writers that Parquet uses since 
> another project is actively modifying those classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5331) NPE in FunctionImplementationRegistry.findDrillFunction() if dynamic UDFs disabled

2017-03-20 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5331:

Reviewer: Arina Ielchiieva

Assigned Reviewer to [~arina]

> NPE in FunctionImplementationRegistry.findDrillFunction() if dynamic UDFs 
> disabled
> --
>
> Key: DRILL-5331
> URL: https://issues.apache.org/jira/browse/DRILL-5331
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Drill provides the Dynamic UDF (DUDF) functionality. DUFDs can be disabled 
> using the following option in {{ExecConstants}}:
> {code}
>   String USE_DYNAMIC_UDFS_KEY = "exec.udf.use_dynamic";
>   BooleanValidator USE_DYNAMIC_UDFS = new 
> BooleanValidator(USE_DYNAMIC_UDFS_KEY, true);
> {code}
> In a unit test, we created a setup in which we wish to use only the local 
> function registry, no DUDF support is needed. Run the code. The following 
> code is invoked when asking for a non-existent function:
> {code}
>   public DrillFuncHolder findDrillFunction(FunctionResolver functionResolver, 
> FunctionCall functionCall) {
> ...
> if (holder == null) {
>   syncWithRemoteRegistry(version.get());
>   List updatedFunctions = 
> localFunctionRegistry.getMethods(newFunctionName, version);
>   holder = functionResolver.getBestMatch(updatedFunctions, functionCall);
> }
> {code}
> The result is an NPE:
> {code}
> ERROR o.a.d.e.e.f.r.RemoteFunctionRegistry - Problem during trying to access 
> remote function registry [registry]
> java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.expr.fn.registry.RemoteFunctionRegistry.getRegistryVersion(RemoteFunctionRegistry.java:119)
>  ~[classes/:na]
> {code}
> The fix is simply to add a DUDF-enabled check:
> {code}
> if (holder == null) {
>   boolean useDynamicUdfs = optionManager != null && 
> optionManager.getOption(ExecConstants.USE_DYNAMIC_UDFS);
>   if (useDynamicUdfs) {
> syncWithRemoteRegistry(version.get());
> ...
> {code}
> Then, disable dynamic UDFs for the test case by setting 
> {{ExecConstants.USE_DYNAMIC_UDFS}} to false.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5323) Provide test tools to create, populate and compare row sets

2017-03-20 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5323:

Reviewer: Sorabh Hamirwasia

Assigned Reviewer to [~shamirwasia]

> Provide test tools to create, populate and compare row sets
> ---
>
> Key: DRILL-5323
> URL: https://issues.apache.org/jira/browse/DRILL-5323
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Operators work with individual row sets. A row set is a collection of records 
> stored as column vectors. (Drill uses various terms for this concept. A 
> record batch is a row set with an operator implementation wrapped around it. 
> A vector container is a row set, but with much functionality left as an 
> exercise for the developer. And so on.)
> To simplify tests, we need a {{TestRowSet}} concept that wraps a 
> {{VectorContainer}} and provides easy ways to:
> * Define a schema for the row set.
> * Create a set of vectors that implement the schema.
> * Populate the row set with test data via code.
> * Add an SV2 to the row set.
> * Pass the row set to operator components (such as generated code blocks.)
> * Compare the results of the operation with an expected result set.
> * Dispose of the underling direct memory when work is done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5318) Create a sub-operator test framework

2017-03-20 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5318:

Reviewer: Sorabh Hamirwasia

Assigned Reviewer to [~shamirwasia]

> Create a sub-operator test framework
> 
>
> Key: DRILL-5318
> URL: https://issues.apache.org/jira/browse/DRILL-5318
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
> Attachments: Sub-OperatorTestFramework.pdf
>
>
> Drill provides two unit test frameworks for whole-server, SQL-based testing: 
> the original {{BaseTestQuery}} and the newer {{ClusterFixture}}. Both use the 
> {{TestBuilder}} mechanism to build system-level functional tests that run 
> queries and check results.
> Jason provided an operator-level test framework based, in part on mocks: 
> As Drill operators become more complex, we have a crying need for true 
> unit-level tests at a level below the whole system and below operators. That 
> is, we need to test the individual pieces that, together, form the operator.
> This umbrella ticket includes a number of tasks needed to create the 
> sub-operator framework. Our intention is that, over time, as we find the need 
> to revisit existing operators, or create new ones, we can employ the 
> sub-operator test framework to exercise code at a finer granularity than is 
> possible prior to this framework.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5319) Refactor FragmentContext and OptionManager for unit testing

2017-03-20 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5319:

Reviewer: Gautam Kumar Parai

Assigned Reviewer to [~gparai]

> Refactor FragmentContext and OptionManager for unit testing
> ---
>
> Key: DRILL-5319
> URL: https://issues.apache.org/jira/browse/DRILL-5319
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Roll-up task for two refactorings, see the sub-tasks for details. This ticket 
> allows a single PR for the two different refactorings since the work heavily 
> overlaps. See DRILL-5320 and DRILL-5321 for details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5361) CURRENT_DATE() documented, but not actually available in Drill

2017-03-17 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930260#comment-15930260
 ] 

Zelaine Fong commented on DRILL-5361:
-

In that case, would it make sense to change this issue to a Doc bug to make 
things more clear?

BTW, if you click on the NOW link on the doc page you referenced, it takes you 
to 
http://drill.apache.org/docs/date-time-functions-and-arithmetic/#other-date-and-time-functions.
  On this page, the examples do show the cases where you need parens and the 
cases where you don't.

> CURRENT_DATE() documented, but not actually available in Drill
> --
>
> Key: DRILL-5361
> URL: https://issues.apache.org/jira/browse/DRILL-5361
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>
> The [Drill 
> documentation|http://drill.apache.org/docs/date-time-functions-and-arithmetic/]
>  describes a CURRENT_DATE() function. Tried the following query:
> {code}
> SELECT CURRENT_DATE() FROM (VALUES(1))
> {code}
> Got the following errors:
> {code}
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found 
> for function signature CURRENT_DATE()
> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, 
> column 8 to line 1, column 21: No match found for function signature 
> CURRENT_DATE()
> {code}
> Please:
> * Implement the function, or
> * Remove the function from the documentation, or
> * Leave the function in the docs, but add a footnote saying that the function 
> is not yet available.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-3510) Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers

2017-03-17 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-3510:

Reviewer: Sudheesh Katkam

Assigned Reviewer to [~sudheeshkatkam]

> Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL 
> identifiers 
> --
>
> Key: DRILL-3510
> URL: https://issues.apache.org/jira/browse/DRILL-3510
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Jinfeng Ni
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.10.0
>
> Attachments: DRILL-3510.patch, DRILL-3510.patch
>
>
> Currently Drill's SQL parser uses backtick as identifier quotes, the same as 
> what MySQL does. However, this is different from ANSI SQL specification, 
> where double quote is used as identifier quotes.  
> MySQL has an option "ANSI_QUOTES", which could be switched on/off by user. 
> Drill should follow the same way, so that Drill users do not have to rewrite 
> their existing queries, if their queries use double quotes. 
> {code}
> SET sql_mode='ANSI_QUOTES';
> {code}
>



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5359) ClassCastException when push down filter on the output of flatten into parquet scan

2017-03-17 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5359:

Reviewer: Aman Sinha  (was: Gautam Kumar Parai)

Oops, didn't notice that [~amansinha100] has already reviewed.

> ClassCastException when push down filter on the output of flatten into 
> parquet scan
> ---
>
> Key: DRILL-5359
> URL: https://issues.apache.org/jira/browse/DRILL-5359
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> The following simplified query would hit ClassCastException.
> {code}
> select n_regionkey
> from (select n_regionkey, 
> flatten(nation.cities) as cities 
>   from cp.`tpch/nation.parquet` nation) as flattenedCities 
> where flattenedCities.cities.`zip` = '12345';
> {code}
> Here is the stacktrace for the Exception : 
> {code}
> caused by: java.lang.ClassCastException: 
> org.apache.drill.common.expression.FunctionCall cannot be cast to 
> org.apache.drill.common.expression.SchemaPath
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:170)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:80)
>  ~[classes/:na]
>   at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) 
> ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.doFunction(DrillOptiq.java:205)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:105)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:80)
>  ~[classes/:na]
>   at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) 
> ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:77) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.store.parquet.ParquetPushDownFilter.doOnMatch(ParquetPushDownFilter.java:141)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.store.parquet.ParquetPushDownFilter$1.onMatch(ParquetPushDownFilter.java:68)
>  ~[classes/:na]
> {code}
> The cause of this problem: Parquet filter pushdown rule tries to push a 
> filter expression containing item/flatten operators into parquet scan. 
> However, the method DrillOptiq.toDrill() does not allow such expression 
> (since "flatten" is not a scalar function).  
> The solution is to disable pushing such filter expression.  Even the rule 
> allows, the underneath parquet metadata would not have the corresponding 
> statistics; there is no point to considering push such filter expression. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5359) ClassCastException when push down filter on the output of flatten into parquet scan

2017-03-17 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5359:

Labels: ready-to-commit  (was: )

> ClassCastException when push down filter on the output of flatten into 
> parquet scan
> ---
>
> Key: DRILL-5359
> URL: https://issues.apache.org/jira/browse/DRILL-5359
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> The following simplified query would hit ClassCastException.
> {code}
> select n_regionkey
> from (select n_regionkey, 
> flatten(nation.cities) as cities 
>   from cp.`tpch/nation.parquet` nation) as flattenedCities 
> where flattenedCities.cities.`zip` = '12345';
> {code}
> Here is the stacktrace for the Exception : 
> {code}
> caused by: java.lang.ClassCastException: 
> org.apache.drill.common.expression.FunctionCall cannot be cast to 
> org.apache.drill.common.expression.SchemaPath
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:170)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:80)
>  ~[classes/:na]
>   at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) 
> ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.doFunction(DrillOptiq.java:205)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:105)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:80)
>  ~[classes/:na]
>   at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) 
> ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:77) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.store.parquet.ParquetPushDownFilter.doOnMatch(ParquetPushDownFilter.java:141)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.store.parquet.ParquetPushDownFilter$1.onMatch(ParquetPushDownFilter.java:68)
>  ~[classes/:na]
> {code}
> The cause of this problem: Parquet filter pushdown rule tries to push a 
> filter expression containing item/flatten operators into parquet scan. 
> However, the method DrillOptiq.toDrill() does not allow such expression 
> (since "flatten" is not a scalar function).  
> The solution is to disable pushing such filter expression.  Even the rule 
> allows, the underneath parquet metadata would not have the corresponding 
> statistics; there is no point to considering push such filter expression. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5363) CURRENT_TIMESTAMP() documented, but not actually available in Drill

2017-03-17 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930130#comment-15930130
 ] 

Zelaine Fong commented on DRILL-5363:
-

I'm using Drill in embedded mode, and current_timestamp works for me.  I'm 
using 1.10.

{code}
0: jdbc:drill:zk=local> select current_timestamp from (values(1));
+--+
|current_timestamp |
+--+
| 2017-03-17 08:22:23.018  |
+--+
1 row selected (1.707 seconds)
{code}

> CURRENT_TIMESTAMP() documented, but not actually available in Drill
> ---
>
> Key: DRILL-5363
> URL: https://issues.apache.org/jira/browse/DRILL-5363
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>
> The [Drill 
> documentation|http://drill.apache.org/docs/date-time-functions-and-arithmetic/]
>  describes a CURRENT_TIMESTAMP() function. Tried the following query:
> {code}
> SELECT CURRENT_TIMESTAMP() FROM (VALUES(1))
> {code}
> Got the following errors:
> {code}
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found 
> for function signature CURRENT_TIMESTAMP()
> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, 
> column 8 to line 1, column 26: No match found for function signature 
> CURRENT_TIMESTAMP()
> {code}
> Please:
> * Implement the function, or
> * Remove the function from the documentation, or
> * Leave the function in the docs, but add a footnote saying that the function 
> is not yet available.
> Note that Drill provides a {{NOW()}} function which does work and would seem 
> to do exactly the same as the non-existent {{CURRENT_TIMESTAMP()}} function.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5362) CURRENT_TIME() documented, but not actually available in Drill

2017-03-17 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930119#comment-15930119
 ] 

Zelaine Fong commented on DRILL-5362:
-

Same comments from DRILL-5361 apply.  Syntax specified in the Jira is wrong.

> CURRENT_TIME() documented, but not actually available in Drill
> --
>
> Key: DRILL-5362
> URL: https://issues.apache.org/jira/browse/DRILL-5362
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>
> The [Drill 
> documentation|http://drill.apache.org/docs/date-time-functions-and-arithmetic/]
>  describes a CURRENT_TIME() function. Tried the following query:
> {code}
> SELECT CURRENT_TIME() FROM (VALUES(1))
> {code}
> Got the following errors:
> {code}
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found 
> for function signature CURRENT_TIME()
> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, 
> column 8 to line 1, column 21: No match found for function signature 
> CURRENT_TIME()
> {code}
> Please:
> * Implement the function, or
> * Remove the function from the documentation, or
> * Leave the function in the docs, but add a footnote saying that the function 
> is not yet available.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5361) CURRENT_DATE() documented, but not actually available in Drill

2017-03-17 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930114#comment-15930114
 ] 

Zelaine Fong commented on DRILL-5361:
-

[~khfaraaz] - that's the intended behavior -- no parens.  This is consistent 
with Postgres and other DBMS's.  Not sure why [~Paul.Rogers] expected this to 
work with the parens.

> CURRENT_DATE() documented, but not actually available in Drill
> --
>
> Key: DRILL-5361
> URL: https://issues.apache.org/jira/browse/DRILL-5361
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>
> The [Drill 
> documentation|http://drill.apache.org/docs/date-time-functions-and-arithmetic/]
>  describes a CURRENT_DATE() function. Tried the following query:
> {code}
> SELECT CURRENT_DATE() FROM (VALUES(1))
> {code}
> Got the following errors:
> {code}
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found 
> for function signature CURRENT_DATE()
> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, 
> column 8 to line 1, column 21: No match found for function signature 
> CURRENT_DATE()
> {code}
> Please:
> * Implement the function, or
> * Remove the function from the documentation, or
> * Leave the function in the docs, but add a footnote saying that the function 
> is not yet available.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5359) ClassCastException when push down filter on the output of flatten into parquet scan

2017-03-17 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5359:

Reviewer: Gautam Kumar Parai

Assigned Reviewer to [~gparai]

> ClassCastException when push down filter on the output of flatten into 
> parquet scan
> ---
>
> Key: DRILL-5359
> URL: https://issues.apache.org/jira/browse/DRILL-5359
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.11.0
>
>
> The following simplified query would hit ClassCastException.
> {code}
> select n_regionkey
> from (select n_regionkey, 
> flatten(nation.cities) as cities 
>   from cp.`tpch/nation.parquet` nation) as flattenedCities 
> where flattenedCities.cities.`zip` = '12345';
> {code}
> Here is the stacktrace for the Exception : 
> {code}
> caused by: java.lang.ClassCastException: 
> org.apache.drill.common.expression.FunctionCall cannot be cast to 
> org.apache.drill.common.expression.SchemaPath
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:170)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:80)
>  ~[classes/:na]
>   at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) 
> ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.doFunction(DrillOptiq.java:205)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:105)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:80)
>  ~[classes/:na]
>   at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) 
> ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:77) 
> ~[classes/:na]
>   at 
> org.apache.drill.exec.store.parquet.ParquetPushDownFilter.doOnMatch(ParquetPushDownFilter.java:141)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.store.parquet.ParquetPushDownFilter$1.onMatch(ParquetPushDownFilter.java:68)
>  ~[classes/:na]
> {code}
> The cause of this problem: Parquet filter pushdown rule tries to push a 
> filter expression containing item/flatten operators into parquet scan. 
> However, the method DrillOptiq.toDrill() does not allow such expression 
> (since "flatten" is not a scalar function).  
> The solution is to disable pushing such filter expression.  Even the rule 
> allows, the underneath parquet metadata would not have the corresponding 
> statistics; there is no point to considering push such filter expression. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-4971) query encounters system error: Statement "break AndOP3" is not enclosed by a breakable statement with label "AndOP3"

2017-03-16 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-4971:
---

Assignee: Vitalii Diravka

> query encounters system error: Statement "break AndOP3" is not enclosed by a 
> breakable statement with label "AndOP3"
> 
>
> Key: DRILL-4971
> URL: https://issues.apache.org/jira/browse/DRILL-4971
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
> Attachments: low_table, medium_table
>
>
> This query returns an error.  The stack trace suggests it might be a schema 
> change issue, but there is no schema change in this table.  Many other 
> queries are succeeding.
> select count(\*) from test where ((int_id > 3060 and int_id < 6002) or 
> (int_id > 9025 and int_id < 11976)) and ((int_id > 9025 and int_id < 11976) 
> or (int_id > 3060 and int_id < 6002)) and (int_id > 3060 and int_id < 6002);
> Error: SYSTEM ERROR: CompileException: Line 232, Column 30: Statement "break 
> AndOP3" is not enclosed by a breakable statement with label "AndOP3"
> [Error Id: 254d093b-79a1-4425-802c-ade08db293e4 on qa-node211:31010]^M
> ^M
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> attempting to load generated class^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M
> There are two partitions to the test table.  One covers the range 3061 - 6001 
> and the other covers the range 9026 - 11975.
> This second query returns a different, but possibly related, error.  
> select count(\*) from orders_parts where (((int_id > -3025 and int_id < -4) 
> or (int_id > -5 and int_id < 3061) or (int_id > 3060 and int_id < 6002)) and 
> (int_id > -5 and int_id < 3061)) and (((int_id > -5 and int_id < 3061) or 
> (int_id > 9025 and int_id < 11976)) and (int_id > -5 and int_id < 3061))^M
> Failed with exception^M
> java.sql.SQLException: SYSTEM ERROR: CompileException: Line 447, Column 30: 
> Statement "break AndOP6" is not enclosed by a breakable statement with label 
> "AndOP6"^M
> ^M
> Fragment 0:0^M
> ^M
> [Error Id: ac09187e-d3a2-41a7-a659-b287aca6039c on qa-node209:31010]^M
> ^M
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> attempting to load generated class^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():198^M
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107^M
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78^M



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5351) Excessive bounds checking in the Parquet reader

2017-03-16 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5351:
---

Assignee: Parth Chandra

> Excessive bounds checking in the Parquet reader 
> 
>
> Key: DRILL-5351
> URL: https://issues.apache.org/jira/browse/DRILL-5351
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> In profiling the Parquet reader, the variable length decoding appears to be a 
> major bottleneck making the reader CPU bound rather than disk bound.
> A yourkit profile indicates the following methods being severe bottlenecks -
> VarLenBinaryReader.determineSizeSerial(long)
>   NullableVarBinaryVector$Mutator.setSafe(int, int, int, int, DrillBuf)
>   DrillBuf.chk(int, int)
>   NullableVarBinaryVector$Mutator.fillEmpties()
> The problem is that each of these methods does some form of bounds checking 
> and eventually of course, the actual write to the ByteBuf is also bounds 
> checked.
> DrillBuf.chk can be disabled by a configuration setting. Disabling this does 
> improve performance of TPCH queries. In addition, all regression, unit, and 
> TPCH-SF100 tests pass. 
> I would recommend we allow users to turn this check off if there are 
> performance critical queries.
> Removing the bounds checking at every level is going to be a fair amount of 
> work. In the meantime, it appears that a few simple changes to variable 
> length vectors improves query performance by about 10% across the board. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5355) Misc. code cleanup

2017-03-14 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5355:

Reviewer: Chunhui Shi

Assigned Reviewer to [~cshi]

> Misc. code cleanup 
> ---
>
> Key: DRILL-5355
> URL: https://issues.apache.org/jira/browse/DRILL-5355
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Recent work in Drill has identified a number of cases where code can be 
> cleaned up: adding missing annotations, etc. These changes don't fit as part 
> of a separate ticket and so are rolled up into a this "general hygiene" 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5324) Provide simplified column reader/writer for use in tests

2017-03-14 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5324:

Reviewer: Gautam Kumar Parai

Assigned Reviewer to [~gparai]

> Provide simplified column reader/writer for use in tests
> 
>
> Key: DRILL-5324
> URL: https://issues.apache.org/jira/browse/DRILL-5324
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> In support of DRILL-5323, we wish to provide a very easy way to work with row 
> sets. See the comment section for examples of the target API.
> Drill provides over 100 different value vectors, any of which may be required 
> to perform a specific unit test. Creating these vectors, populating them, and 
> retrieving values, is very tedious. The work is so complex that it acts to 
> discourage developers from writing such tests.
> To simplify the task, we wish to provide a simplified row set reader and 
> writer. To do that, we need to generate the corresponding column reader and 
> writer for each value vector. This ticket focuses on the column-level readers 
> and writers, and the required code generation.
> Drill already provides vector readers and writers derived from 
> {{FieldReader}}. However, these readers do not provide a uniform get/set 
> interface that is type independent on the application side. Instead, 
> application code must be aware of the type of the vector, something we seek 
> to avoid for test code.
> The reader and writer classes are designed to be used in many contexts, not 
> just for testing. As a result, their implementation makes no assumptions 
> about the broader row reader and writer, other than that a row index and the 
> required value vector are both available. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5337) OpenTSDB storage plugin

2017-03-14 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5337:

Reviewer: Arina Ielchiieva

Assigned Reviewer to [~arina]

> OpenTSDB storage plugin
> ---
>
> Key: DRILL-5337
> URL: https://issues.apache.org/jira/browse/DRILL-5337
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Dmitriy Gavrilovych
>  Labels: features
>
> Storage plugin for OpenTSDB
> The plugin uses REST API to work with TSDB. 
> Expected queries are listed below:
> SELECT * FROM openTSDB.`warp.speed.test`;
> Return all elements from warp.speed.test table with default aggregator SUM
> SELECT * FROM openTSDB.`(metric=warp.speed.test)`;
> Return all elements from (metric=warp.speed.test) table as a previous query, 
> but with alternative FROM syntax
> SELECT * FROM openTSDB.`(metric=warp.speed.test, aggregator=avg)`;
> Return all elements from warp.speed.test table, but with the custom aggregator
> SELECT `timestamp`, sum(`aggregated value`) FROM 
> openTSDB.`(metric=warp.speed.test, aggregator=avg)` GROUP BY `timestamp`;
> Return aggregated and grouped value by standard drill functions from 
> warp.speed.test table, but with the custom aggregator
> SELECT * FROM openTSDB.`(metric=warp.speed.test, downsample=5m-avg)`
> Return data limited by downsample



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5351) Excessive bounds checking in the Parquet reader

2017-03-13 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5351:

Reviewer: Paul Rogers

Assigned Reviewer to [~Paul.Rogers]

> Excessive bounds checking in the Parquet reader 
> 
>
> Key: DRILL-5351
> URL: https://issues.apache.org/jira/browse/DRILL-5351
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>
> In profiling the Parquet reader, the variable length decoding appears to be a 
> major bottleneck making the reader CPU bound rather than disk bound.
> A yourkit profile indicates the following methods being severe bottlenecks -
> VarLenBinaryReader.determineSizeSerial(long)
>   NullableVarBinaryVector$Mutator.setSafe(int, int, int, int, DrillBuf)
>   DrillBuf.chk(int, int)
>   NullableVarBinaryVector$Mutator.fillEmpties()
> The problem is that each of these methods does some form of bounds checking 
> and eventually of course, the actual write to the ByteBuf is also bounds 
> checked.
> DrillBuf.chk can be disabled by a configuration setting. Disabling this does 
> improve performance of TPCH queries. In addition, all regression, unit, and 
> TPCH-SF100 tests pass. 
> I would recommend we allow users to turn this check off if there are 
> performance critical queries.
> Removing the bounds checking at every level is going to be a fair amount of 
> work. In the meantime, it appears that a few simple changes to variable 
> length vectors improves query performance by about 10% across the board. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5344) External sort priority queue copier fails with an empty batch

2017-03-10 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5344:

Reviewer: Boaz Ben-Zvi

Assigned Reviewer to [~ben-zvi]

> External sort priority queue copier fails with an empty batch
> -
>
> Key: DRILL-5344
> URL: https://issues.apache.org/jira/browse/DRILL-5344
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> The external sort uses a "priority queue copier" to merge batches when 
> spilling or when merging spilled batches.
> The code will fail with an {{IndexOutOfBoundsException}} if any record batch 
> is empty. The reason is a faulty assumption in generated code:
> {code}
>   public void setup(...) {
> ...
>   vector4.set(i, i, batchGroups.get(i).getNextIndex());
> ...
>   }
>   public int getNextIndex() {
> if (pointer == getRecordCount()) {
>   return -1;
> }
> ...
>   }
> {code}
> The code to get the next index returns -1 when the "position" in a record 
> batch is zero. The -1 position translates (when truncated) into 65535 which 
> produces the index exception.
> The workaround has been to special case empty batches elsewhere in the code, 
> apparently to avoid hitting this error.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5330) NPE in FunctionImplementationRegistry.functionReplacement()

2017-03-10 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905969#comment-15905969
 ] 

Zelaine Fong commented on DRILL-5330:
-

[~paul-rogers] - I'm fine with checking it in separately.  I just wasn't sure 
if it was existing tests that are failing or new ones that are in development.  
As long as we will have tests to validate these changes.

> NPE in FunctionImplementationRegistry.functionReplacement()
> ---
>
> Key: DRILL-5330
> URL: https://issues.apache.org/jira/browse/DRILL-5330
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The code in {{FunctionImplementationRegistry.functionReplacement()}} will 
> produce an NPE if ever it is called:
> {code}
>   if (optionManager != null
>   && optionManager.getOption(
>ExecConstants.CAST_TO_NULLABLE_NUMERIC).bool_val
>   ...
> {code}
> If an option manager is provided, then get the specified option. The option 
> manager will contain a value for that option only if the user has explicitly 
> set that option. Suppose the user had not set the option. Then the return 
> from {{getOption()}} will be null.
> The next thing we do is *assume* that the option exists and is a boolean by 
> dereferencing the option. This will trigger an NPE. This NPE was seen in 
> detail-level unit tests.
> The proper way to handle such options is to use an option validator. 
> Strangely, one actually exists in {{ExecConstants}}:
> {code}
>   String CAST_TO_NULLABLE_NUMERIC = 
> "drill.exec.functions.cast_empty_string_to_null";
>   OptionValidator CAST_TO_NULLABLE_NUMERIC_OPTION = new 
> BooleanValidator(CAST_TO_NULLABLE_NUMERIC, false);
> {code}
> Then do:
> {code}
> optionManager.getOption(
>  ExecConstants.CAST_TO_NULLABLE_NUMERIC_OPTION)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5330) NPE in FunctionImplementationRegistry.functionReplacement()

2017-03-10 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905669#comment-15905669
 ] 

Zelaine Fong commented on DRILL-5330:
-

[~Paul.Rogers] - I noticed your pull request does not have a unit test.  Was 
this problem uncovered with existing unit tests?  Or something you found 
writing new unit tests that haven't been committed yet?

> NPE in FunctionImplementationRegistry.functionReplacement()
> ---
>
> Key: DRILL-5330
> URL: https://issues.apache.org/jira/browse/DRILL-5330
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The code in {{FunctionImplementationRegistry.functionReplacement()}} will 
> produce an NPE if ever it is called:
> {code}
>   if (optionManager != null
>   && optionManager.getOption(
>ExecConstants.CAST_TO_NULLABLE_NUMERIC).bool_val
>   ...
> {code}
> If an option manager is provided, then get the specified option. The option 
> manager will contain a value for that option only if the user has explicitly 
> set that option. Suppose the user had not set the option. Then the return 
> from {{getOption()}} will be null.
> The next thing we do is *assume* that the option exists and is a boolean by 
> dereferencing the option. This will trigger an NPE. This NPE was seen in 
> detail-level unit tests.
> The proper way to handle such options is to use an option validator. 
> Strangely, one actually exists in {{ExecConstants}}:
> {code}
>   String CAST_TO_NULLABLE_NUMERIC = 
> "drill.exec.functions.cast_empty_string_to_null";
>   OptionValidator CAST_TO_NULLABLE_NUMERIC_OPTION = new 
> BooleanValidator(CAST_TO_NULLABLE_NUMERIC, false);
> {code}
> Then do:
> {code}
> optionManager.getOption(
>  ExecConstants.CAST_TO_NULLABLE_NUMERIC_OPTION)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4896) After a failed CTAS, the table both exists and does not exist

2017-03-10 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905433#comment-15905433
 ] 

Zelaine Fong commented on DRILL-4896:
-

[~arina] - can you check if this is fixed by changes you made for CTTAS.  
Thanks.

> After a failed CTAS, the table both exists and does not exist
> -
>
> Key: DRILL-4896
> URL: https://issues.apache.org/jira/browse/DRILL-4896
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.8.0
>Reporter: Boaz Ben-Zvi
>
>   After CTAS failed (due to no space on storage device) there were 
> (incomplete) Parquet files left.  A subsequent CTAS for the same table name 
> fails with "table exists", and a subsequent DROP on the same table name fails 
> with "table does not exist".
>   A possible enhancement: DROP to be able to cleanup such a corrupted table.
> 0: jdbc:drill:zk=local> create table `/drill/spill/tt1` as
> . . . . . . . . . . . >  select
> . . . . . . . . . . . >case when columns[2] = '' then cast(null as 
> varchar(100)) else cast(columns[2] as varchar(100)) end,
> . . . . . . . . . . . >case when columns[3] = '' then cast(null as 
> varchar(100)) else cast(columns[3] as varchar(100)) end,
> . . . . . . . . . . . >case when columns[4] = '' then cast(null as 
> varchar(100)) else cast(columns[4] as varchar(100)) end, 
> . . . . . . . . . . . >case when columns[5] = '' then cast(null as 
> varchar(100)) else cast(columns[5] as varchar(100)) end, 
> . . . . . . . . . . . >case when columns[0] = '' then cast(null as 
> varchar(100)) else cast(columns[0] as varchar(100)) end, 
> . . . . . . . . . . . >case when columns[8] = '' then cast(null as 
> varchar(100)) else cast(columns[8] as varchar(100)) end
> . . . . . . . . . . . > FROM 
> dfs.`/Users/boazben-zvi/data/store_sales/store_sales.dat`;
> Exception in thread "drill-executor-4" org.apache.hadoop.fs.FSError: 
> java.io.IOException: No space left on device
>   . 39 more
> Error: SYSTEM ERROR: IOException: The file being written is in an invalid 
> state. Probably caused by an error thrown previously. Current state: COLUMN
> Fragment 0:0
> [Error Id: de84c212-2400-4a08-a15c-8e3adb5ec774 on 10.250.57.63:31010] 
> (state=,code=0)
> 0: jdbc:drill:zk=local> create table `/drill/spill/tt1` as select * from 
> dfs.`/Users/boazben-zvi/data/store_sales/store_sales.dat`;
> Error: VALIDATION ERROR: A table or view with given name [/drill/spill/tt1] 
> already exists in schema [dfs.tmp]
> [Error Id: 0ef99a15-9d67-49ad-87fb-023105dece3c on 10.250.57.63:31010] 
> (state=,code=0)
> 0: jdbc:drill:zk=local> drop table `/drill/spill/tt1` ;
> Error: DATA_WRITE ERROR: Failed to drop table: File /drill/spill/tt1 does not 
> exist
> [Error Id: c22da79f-ecbd-423c-b5b2-4eae7d1263d7 on 10.250.57.63:31010] 
> (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5165) wrong results - LIMIT ALL and OFFSET clause in same query

2017-03-08 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5165:

Reviewer: Jinfeng Ni

Assigned Reviewer to [~jni]

> wrong results - LIMIT ALL and OFFSET clause in same query
> -
>
> Key: DRILL-5165
> URL: https://issues.apache.org/jira/browse/DRILL-5165
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
>Priority: Critical
>
> This issue was reported by a user on Drill's user list.
> Drill 1.10.0 commit ID : bbcf4b76
> I tried a similar query on apache Drill 1.10.0 and Drill returns wrong 
> results when compared to Postgres, for a query that uses LIMIT ALL and OFFSET 
> clause in the same query. We need to file a JIRA to track this issue.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by 1 limit 
> all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.211 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by col_int 
> limit all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.24 seconds)
> {noformat}
> Query => select col_int from typeall_l limit all offset 10;
> Drill 1.10.0 returns 85 rows
> whereas for same query,
> postgres=# select col_int from typeall_l limit all offset 10;
> Postgres 9.3 returns 95 rows, which is the correct expected result.
> Query plan for above query that returns wrong results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select col_int from typeall_l 
> limit all offset 10;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(col_int=[$0])
> 00-02SelectionVectorRemover
> 00-03  Limit(offset=[10])
> 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///tmp/typeall_l]], selectionRoot=maprfs:/tmp/typeall_l, 
> numFiles=1, usedMetadataFile=false, columns=[`col_int`]]])
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5327) Hash aggregate can return empty batch which can cause schema change exception

2017-03-07 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5327:
---

Assignee: Boaz Ben-Zvi

[~ben-zvi] - can you confirm that this is due to your changes for DRILL-5293.

> Hash aggregate can return empty batch which can cause schema change exception
> -
>
> Key: DRILL-5327
> URL: https://issues.apache.org/jira/browse/DRILL-5327
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.10.0
>Reporter: Chun Chang
>Assignee: Boaz Ben-Zvi
>Priority: Blocker
>
> Hash aggregate can return empty batches which cause drill to throw schema 
> change exception (not handling this type of schema change). This is not a new 
> bug. But a recent hash function change (a theoretically correct change) may 
> have increased the chance of hitting this issue. I don't have scientific data 
> to support my claim (in fact I don't believe it's the case), but a regular 
> regression run used to pass fails now due to this bug. My concern is that 
> existing drill users out there may have queries that used to work but fail 
> now. It will be difficult to explain why the new release is better for them. 
> I put this bug as blocker so we can discuss it before releasing 1.10.
> {noformat}
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/original/text/query66.sql
> Query: 
> -- start query 66 in stream 0 using template query66.tpl 
> SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>w_city, 
>w_county, 
>w_state, 
>w_country, 
>ship_carriers, 
>year1,
>Sum(jan_sales) AS jan_sales, 
>Sum(feb_sales) AS feb_sales, 
>Sum(mar_sales) AS mar_sales, 
>Sum(apr_sales) AS apr_sales, 
>Sum(may_sales) AS may_sales, 
>Sum(jun_sales) AS jun_sales, 
>Sum(jul_sales) AS jul_sales, 
>Sum(aug_sales) AS aug_sales, 
>Sum(sep_sales) AS sep_sales, 
>Sum(oct_sales) AS oct_sales, 
>Sum(nov_sales) AS nov_sales, 
>Sum(dec_sales) AS dec_sales, 
>Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot, 
>Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot, 
>Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot, 
>Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot, 
>Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot, 
>Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot, 
>Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot, 
>Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot, 
>Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot, 
>Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot, 
>Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot, 
>Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot, 
>Sum(jan_net)   AS jan_net, 
>Sum(feb_net)   AS feb_net, 
>Sum(mar_net)   AS mar_net, 
>Sum(apr_net)   AS apr_net, 
>Sum(may_net)   AS may_net, 
>Sum(jun_net)   AS jun_net, 
>Sum(jul_net)   AS jul_net, 
>Sum(aug_net)   AS aug_net, 
>Sum(sep_net)   AS sep_net, 
>Sum(oct_net)   AS oct_net, 
>Sum(nov_net)   AS nov_net, 
>Sum(dec_net)   AS dec_net 
> FROM   (SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>w_city, 
>w_county, 
>w_state, 
>w_country, 
>'ZOUROS' 
>|| ',' 
>|| 'ZHOU' AS ship_carriers, 
>d_yearAS year1, 
>Sum(CASE 
>  WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity 
>  ELSE 0 
>END)  AS jan_sales, 
>Sum(CASE 
> 

[jira] [Updated] (DRILL-5326) Unit tests failures related to the SERVER_METADTA

2017-03-07 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5326:

Reviewer: Jinfeng Ni

Assigned Reviewer to [~jni]

> Unit tests failures related to the SERVER_METADTA
> -
>
> Key: DRILL-5326
> URL: https://issues.apache.org/jira/browse/DRILL-5326
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.10.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Blocker
> Fix For: 1.10.0
>
>
> 1. In DRILL-5301 a new SERVER_META rpc call was introduced. The server will 
> support this method only from 1.10.0 drill version. For drill 1.10.0-SNAPHOT 
> it is disabled. 
> When I enabled this method (by way of upgrading drill version to 1.10.0 or 
> 1.11.0-SNAPSHOT) I found the following exception:
> {code}
> java.lang.AssertionError: Unexpected/unhandled MinorType value GENERIC_OBJECT
> {code}
> It appears in several tests (for example in 
> DatabaseMetadataTest#testNullsAreSortedMethodsSaySortedHigh).
> The reason of it is "GENERIC_OBJECT" RPC-/protobuf-level type is appeared in 
> the ServerMetadata#ConvertSupportList. (Supporting of GENERIC_OBJECT was 
> added in DRILL-1126).
> The proposed solution is to add the appropriate "JAVA_OBJECT" sql type name 
> for this "GENERIC_OBJECT" RPC-/protobuf-level data type.
> 2. After fixing the first one the mentioned above test still fails by reason 
> of the incorrect "NullCollation" value in the "ServerMetaProvider". According 
> to the [doc|https://drill.apache.org/docs/order-by-clause/#usage-notes] the 
> default val should be NC_HIGH (NULL is the highest value).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4335) Apache Drill should support network encryption

2017-03-07 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4335:

Reviewer: Sudheesh Katkam

Assigned Reviewer to [~sudheeshkatkam]

> Apache Drill should support network encryption
> --
>
> Key: DRILL-4335
> URL: https://issues.apache.org/jira/browse/DRILL-4335
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Keys Botzum
>Assignee: Sorabh Hamirwasia
>  Labels: security
>
> This is clearly related to Drill-291 but wanted to make explicit that this 
> needs to include network level encryption and not just authentication. This 
> is particularly important for the client connection to Drill which will often 
> be sending passwords in the clear until there is encryption.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-03-06 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5316:

Reviewer: Sorabh Hamirwasia

Assigned Reviewer to [~shamirwasia]

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Priority: Critical
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (DRILL-5290) Provide an option to build operator table once for built-in static functions and reuse it across queries.

2017-03-02 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong resolved DRILL-5290.
-
Resolution: Fixed

> Provide an option to build operator table once for built-in static functions 
> and reuse it across queries.
> -
>
> Key: DRILL-5290
> URL: https://issues.apache.org/jira/browse/DRILL-5290
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> Currently, DrillOperatorTable which contains standard SQL operators and 
> functions and Drill User Defined Functions (UDFs) (built-in and dynamic) gets 
> built for each query as part of creating QueryContext. This is an expensive 
> operation ( ~30 msec to build) and allocates  ~2M on heap for each query. For 
> high throughput, low latency operational queries, we quickly run out of heap 
> memory, causing JVM hangs. Build operator table once during startup for 
> static built-in functions and save in DrillbitContext, so we can reuse it 
> across queries.
> Provide a system/session option to not use dynamic UDFs so we can use the 
> operator table saved in DrillbitContext and avoid building each time.
> *Please note, changes are adding new option exec.udf.use_dynamic which needs 
> to be documented.*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (DRILL-5287) Provide option to skip updates of ephemeral state changes in Zookeeper

2017-03-02 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong resolved DRILL-5287.
-
Resolution: Fixed

> Provide option to skip updates of ephemeral state changes in Zookeeper
> --
>
> Key: DRILL-5287
> URL: https://issues.apache.org/jira/browse/DRILL-5287
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> We put transient profiles in zookeeper and update state as query progresses 
> and changes states. It is observed that this adds latency of ~45msec for each 
> update in the query execution path. This gets even worse when high number of 
> concurrent queries are in progress. For concurrency=100, the average query 
> response time even for short queries  is 8 sec vs 0.2 sec with these updates 
> disabled. For short lived queries in a high-throughput scenario, it is of no 
> value to update state changes in zookeeper. We need an option to disable 
> these updates for short running operational queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (DRILL-5252) A condition returns always true

2017-03-02 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong resolved DRILL-5252.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

> A condition returns always true
> ---
>
> Key: DRILL-5252
> URL: https://issues.apache.org/jira/browse/DRILL-5252
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: JC
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> I've found the following code smell in recent github snapshot.
> Path: 
> exec/java-exec/src/main/java/org/apache/drill/exec/expr/EqualityVisitor.java
> {code:java}
> 287 
> 288   @Override
> 289   public Boolean visitNullConstant(TypedNullConstant e, LogicalExpression 
> value) throws RuntimeException {
> 290 if (!(value instanceof TypedNullConstant)) {
> 291   return false;
> 292 }
> 293 return e.getMajorType().equals(e.getMajorType());
> 294   }
> 295
> {code}
> Should it be like this?
> {code:java}
> 292 }
> 293 return value.getMajorType().equals(e.getMajorType());
> 294   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5226) External Sort encountered an error while spilling to disk

2017-03-02 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892781#comment-15892781
 ] 

Zelaine Fong commented on DRILL-5226:
-

[~rkins] - Is this an issue to the new external sort, or only with the existing 
sort?

> External Sort encountered an error while spilling to disk
> -
>
> Key: DRILL-5226
> URL: https://issues.apache.org/jira/browse/DRILL-5226
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 277578d5-8bea-27db-0da1-cec0f53a13df.sys.drill, 
> profile_scenario3.sys.drill, scenario3.log
>
>
> Environment : 
> {code}
> git.commit.id.abbrev=2af709f
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> Nodes in Mapr Cluster : 1
> Data Size : ~ 0.35 GB
> No of Columns : 1
> Width of column : 256 chars
> {code}
> The below query fails before spilling to disk due to wrong estimates of the 
> record batch size.
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_node` = 1;
> +---+--+
> |  ok   |   summary|
> +---+--+
> | true  | planner.width.max_per_node updated.  |
> +---+--+
> 1 row selected (1.11 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 62914560;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +---++
> 1 row selected (0.362 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.disable_exchanges` = true;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | planner.disable_exchanges updated.  |
> +---+-+
> 1 row selected (0.277 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select * from (select * from 
> dfs.`/drill/testdata/resource-manager/250wide-small.tbl` order by 
> columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf';
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to 
> disk
> Unable to allocate buffer of size 1048576 (rounded from 618889) due to memory 
> limit. Current allocation: 62736000
> Fragment 0:0
> [Error Id: 1bb933c8-7dc6-4cbd-8c8e-0e095baac719 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Exception from the logs
> {code}
> 2017-01-26 15:33:09,307 [277578d5-8bea-27db-0da1-cec0f53a13df:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: External Sort 
> encountered an error while spilling to disk (Unable to allocate buffer of 
> size 1048576 (rounded from 618889) due to memory limit. Current allocation: 
> 62736000)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External 
> Sort encountered an error while spilling to disk
> Unable to allocate buffer of size 1048576 (rounded from 618889) due to memory 
> limit. Current allocation: 62736000
> [Error Id: 1bb933c8-7dc6-4cbd-8c8e-0e095baac719 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:603)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:411)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 

[jira] [Updated] (DRILL-5304) Queries fail intermittently when there is skew in data distribution

2017-03-02 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5304:

Labels: ready-to-commit  (was: )

> Queries fail intermittently when there is skew in data distribution
> ---
>
> Key: DRILL-5304
> URL: https://issues.apache.org/jira/browse/DRILL-5304
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: ready-to-commit
> Attachments: query1_drillbit.log.txt, query2_drillbit.log.txt
>
>
> In a distributed environment, we've observed certain queries to fail 
> execution intermittently, with an assignment logic issue, when the underlying 
> data is skewed w.r.t distribution. 
> For example the TPC-H [query 
> 7|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Advanced/tpch/tpch_sf100/parquet/07.q]
>  failed with the below error:
> {code}
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: 
> MinorFragmentId 105 has no read entries assigned
> ...
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: MinorFragmentId 105 has no read entries 
> assigned
> org.apache.drill.exec.work.foreman.Foreman.run():281
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():744
>   Caused By (java.lang.IllegalArgumentException) MinorFragmentId 105 has no 
> read entries assigned
> {code}
> Log containing full stack trace is attached.
> And for this query, the underlying TPC-H SF100 Parquet dataset was observed 
> to be located mostly only on 2-3 nodes on an 8 node DFS environment. The data 
> distribution skew on this cluster is most likely the triggering factor for 
> this case, as the same query, on the same dataset does not show this failure 
> on a different test cluster (with possibly different data distribution). 
> Also, another 
> [query|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/limit0/window_functions/bugs/data/drill-3700.sql]
>  failed with a similar error when slice target was set to 1. 
> {code}
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: 
> MinorFragmentId 66 has no read entries assigned
> ...
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: MinorFragmentId 66 has no read entries 
> assigned
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5304) Queries fail intermittently when there is skew in data distribution

2017-02-28 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5304:

Reviewer: Jinfeng Ni

Assigned Reviewer to [~jni]

> Queries fail intermittently when there is skew in data distribution
> ---
>
> Key: DRILL-5304
> URL: https://issues.apache.org/jira/browse/DRILL-5304
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
> Attachments: query1_drillbit.log.txt, query2_drillbit.log.txt
>
>
> In a distributed environment, we've observed certain queries to fail 
> execution intermittently, with an assignment logic issue, when the underlying 
> data is skewed w.r.t distribution. 
> For example the TPC-H [query 
> 7|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Advanced/tpch/tpch_sf100/parquet/07.q]
>  failed with the below error:
> {code}
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: 
> MinorFragmentId 105 has no read entries assigned
> ...
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: MinorFragmentId 105 has no read entries 
> assigned
> org.apache.drill.exec.work.foreman.Foreman.run():281
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():744
>   Caused By (java.lang.IllegalArgumentException) MinorFragmentId 105 has no 
> read entries assigned
> {code}
> Log containing full stack trace is attached.
> And for this query, the underlying TPC-H SF100 Parquet dataset was observed 
> to be located mostly only on 2-3 nodes on an 8 node DFS environment. The data 
> distribution skew on this cluster is most likely the triggering factor for 
> this case, as the same query, on the same dataset does not show this failure 
> on a different test cluster (with possibly different data distribution). 
> Also, another 
> [query|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/limit0/window_functions/bugs/data/drill-3700.sql]
>  failed with a similar error when slice target was set to 1. 
> {code}
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: 
> MinorFragmentId 66 has no read entries assigned
> ...
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: MinorFragmentId 66 has no read entries 
> assigned
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5293) Poor performance of Hash Table due to same hash value as distribution below

2017-02-27 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5293:

Reviewer: Chunhui Shi

Assigned Reviewer to [~cshi]

> Poor performance of Hash Table due to same hash value as distribution below
> ---
>
> Key: DRILL-5293
> URL: https://issues.apache.org/jira/browse/DRILL-5293
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.8.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>
> The computation of the hash value is basically the same whether for the Hash 
> Table (used by Hash Agg, and Hash Join), or for distribution of rows at the 
> exchange. As a result, a specific Hash Table (in a parallel minor fragment) 
> gets only rows "filtered out" by the partition below ("upstream"), so the 
> pattern of this filtering leads to a non uniform usage of the hash buckets in 
> the table.
>   Here is a simplified example: An exchange partitions into TWO (minor 
> fragments), each running a Hash Agg. So the partition sends rows of EVEN hash 
> values to the first, and rows of ODD hash values to the second. Now the first 
> recomputes the _same_ hash value for its Hash table -- and only the even 
> buckets get used !!  (Or with a partition into EIGHT -- possibly only one 
> eighth of the buckets would be used !! ) 
>This would lead to longer hash chains and thus a _poor performance_ !
> A possible solution -- add a distribution function distFunc (only for 
> partitioning) that takes the hash value and "scrambles" it so that the 
> entropy in all the bits effects the low bits of the output. This function 
> should be applied (in HashPrelUtil) over the generated code that produces the 
> hash value, like:
>distFunc( hash32(field1, hash32(field2, hash32(field3, 0))) );
> Tested with a huge hash aggregate (64 M rows) and a parallelism of 8 ( 
> planner.width.max_per_node = 8 ); minor fragments 0 and 4 used only 1/8 of 
> their buckets, the others used 1/4 of their buckets.  Maybe the reason for 
> this variance is that distribution is using "hash32AsDouble" and hash agg is 
> using "hash32".  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DRILL-5300) SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files

2017-02-27 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886026#comment-15886026
 ] 

Zelaine Fong edited comment on DRILL-5300 at 2/27/17 4:06 PM:
--

Based on these lines in your stack trace:

{code}
... 5 common frames omitted
2017-02-27 04:32:57,867 [drill-executor-453] ERROR 
o.a.d.exec.server.BootStrapContext - 
org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$DecompressionHelper.decompress(AsyncPageReader.java:402)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
{code}

The memory leak appears to be DRILL-5160.  

The missing snappy dependency is DRILL-5157.  If you pick up the fix for 
DRILL-5157, that will avoid the dependency problem you're hitting.


was (Author: zfong):
Based on these lines in your stack trace:

... 5 common frames omitted
2017-02-27 04:32:57,867 [drill-executor-453] ERROR 
o.a.d.exec.server.BootStrapContext - 
org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$DecompressionHelper.decompress(AsyncPageReader.java:402)
 ~[drill-java-exec-1.9.0.jar:1.9.0]

The memory leak appears to be DRILL-5160.  

The missing snappy dependency is DRILL-5157.  If you pick up the fix for 
DRILL-5157, that will avoid the dependency problem you're hitting.

> SYSTEM ERROR: IllegalStateException: Memory was leaked by query while 
> querying parquet files
> 
>
> Key: DRILL-5300
> URL: https://issues.apache.org/jira/browse/DRILL-5300
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
> Environment: OS: Linux
>Reporter: Muhammad Gelbana
> Attachments: both_queries_logs.zip
>
>
> Running the following query against parquet files (I modified some values for 
> privacy reasons)
> {code:title=Query causing the long logs|borderStyle=solid}
> SELECT AL4.NAME, AL5.SEGMENT2, SUM(AL1.AMOUNT), AL2.ATTRIBUTE4, 
> AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
> AL11.NAME FROM 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA__TRX_LINE_GL_DIST_ALL`
>  AL1, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA_OMER_TRX_ALL`
>  AL2, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX`
>  AL3, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_HR_COMMON/HR_ALL_ORGANIZATION_UNITS`
>  AL4, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_CODE_COMBINATIONS`
>  AL5, 
> dfs.`/disk2/XXX/XXX//data/../parquet//XXAT_AR_MU_TAB` 
> AL8, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX`
>  AL11, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
>  AL12, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
>  AL13, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
>  AL14, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
>  AL15, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
>  AL16, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
>  AL17, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
>  AL18, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
>  AL19 WHERE (AL2.SHIP_TO__USE_ID = AL15._USE_ID AND 
> AL15.___ID = AL14.___ID AND AL14.X__ID = 
> AL12.X__ID AND AL12.LOCATION_ID = AL13.LOCATION_ID AND 
> AL17.___ID = AL16.___ID AND AL16.X__ID = 
> AL19.X__ID AND AL19.LOCATION_ID = AL18.LOCATION_ID AND 
> AL2.BILL_TO__USE_ID = AL17._USE_ID AND AL2.SET_OF_X_ID = 
> AL3.SET_OF_X_ID AND AL1.CODE_COMBINATION_ID = AL5.CODE_COMBINATION_ID AND 
> AL5.SEGMENT4 = AL8.MU AND AL1.SET_OF_X_ID = AL11.SET_OF_X_ID AND 
> AL2.ORG_ID = AL4.ORGANIZATION_ID AND AL2.OMER_TRX_ID = 
> AL1.OMER_TRX_ID) AND ((AL5.SEGMENT2 = '41' AND AL1.AMOUNT <> 0 AND 
> AL4.NAME IN ('XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 
> 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-') 
> AND AL3.NAME like '%-PR-%')) GROUP BY AL4.NAME, AL5.SEGMENT2, AL2.ATTRIBUTE4, 
> AL2.XXX__CODE, 

[jira] [Commented] (DRILL-5300) SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files

2017-02-27 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886026#comment-15886026
 ] 

Zelaine Fong commented on DRILL-5300:
-

Based on these lines in your stack trace:

... 5 common frames omitted
2017-02-27 04:32:57,867 [drill-executor-453] ERROR 
o.a.d.exec.server.BootStrapContext - 
org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$DecompressionHelper.decompress(AsyncPageReader.java:402)
 ~[drill-java-exec-1.9.0.jar:1.9.0]

The memory leak appears to be DRILL-5160.  

The missing snappy dependency is DRILL-5157.  If you pick up the fix for 
DRILL-5157, that will avoid the dependency problem you're hitting.

> SYSTEM ERROR: IllegalStateException: Memory was leaked by query while 
> querying parquet files
> 
>
> Key: DRILL-5300
> URL: https://issues.apache.org/jira/browse/DRILL-5300
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
> Environment: OS: Linux
>Reporter: Muhammad Gelbana
> Attachments: both_queries_logs.zip
>
>
> Running the following query against parquet files (I modified some values for 
> privacy reasons)
> {code:title=Query causing the long logs|borderStyle=solid}
> SELECT AL4.NAME, AL5.SEGMENT2, SUM(AL1.AMOUNT), AL2.ATTRIBUTE4, 
> AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
> AL11.NAME FROM 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA__TRX_LINE_GL_DIST_ALL`
>  AL1, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA_OMER_TRX_ALL`
>  AL2, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX`
>  AL3, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_HR_COMMON/HR_ALL_ORGANIZATION_UNITS`
>  AL4, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_CODE_COMBINATIONS`
>  AL5, 
> dfs.`/disk2/XXX/XXX//data/../parquet//XXAT_AR_MU_TAB` 
> AL8, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX`
>  AL11, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
>  AL12, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
>  AL13, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
>  AL14, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
>  AL15, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
>  AL16, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
>  AL17, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
>  AL18, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
>  AL19 WHERE (AL2.SHIP_TO__USE_ID = AL15._USE_ID AND 
> AL15.___ID = AL14.___ID AND AL14.X__ID = 
> AL12.X__ID AND AL12.LOCATION_ID = AL13.LOCATION_ID AND 
> AL17.___ID = AL16.___ID AND AL16.X__ID = 
> AL19.X__ID AND AL19.LOCATION_ID = AL18.LOCATION_ID AND 
> AL2.BILL_TO__USE_ID = AL17._USE_ID AND AL2.SET_OF_X_ID = 
> AL3.SET_OF_X_ID AND AL1.CODE_COMBINATION_ID = AL5.CODE_COMBINATION_ID AND 
> AL5.SEGMENT4 = AL8.MU AND AL1.SET_OF_X_ID = AL11.SET_OF_X_ID AND 
> AL2.ORG_ID = AL4.ORGANIZATION_ID AND AL2.OMER_TRX_ID = 
> AL1.OMER_TRX_ID) AND ((AL5.SEGMENT2 = '41' AND AL1.AMOUNT <> 0 AND 
> AL4.NAME IN ('XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 
> 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-') 
> AND AL3.NAME like '%-PR-%')) GROUP BY AL4.NAME, AL5.SEGMENT2, AL2.ATTRIBUTE4, 
> AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
> AL11.NAME
> {code}
> {code:title=Query causing the short logs|borderStyle=solid}
> SELECT AL11.NAME
> FROM
> dfs.`/XXX/XXX/XXX/data/../parquet/XXX_XXX_COMMON/GL_XXX` 
> LIMIT 10
> {code}
> This issue may be a duplicate for [this 
> one|https://issues.apache.org/jira/browse/DRILL-4398] but I created a new one 
> based on [this 
> suggestion|https://issues.apache.org/jira/browse/DRILL-4398?focusedCommentId=15884846=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15884846].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4398) SYSTEM ERROR: IllegalStateException: Memory was leaked by query

2017-02-26 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884846#comment-15884846
 ] 

Zelaine Fong commented on DRILL-4398:
-

[~mgelbana] - can you share the stack trace from the Drillbit log when you 
encounter this error.  Also, it might be better to log a new issue, as this 
original issue was reported when selecting from a JDBC data source, which is 
very likely a different problem.

> SYSTEM ERROR: IllegalStateException: Memory was leaked by query
> ---
>
> Key: DRILL-4398
> URL: https://issues.apache.org/jira/browse/DRILL-4398
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
>Reporter: N Campbell
>Assignee: Taras Supyk
>
> Several queries fail with memory leaked errors
> select tjoin2.rnum, tjoin1.c1, tjoin2.c1 as c1j2, tjoin2.c2 as c2j2 from 
> postgres.public.tjoin1 full outer join postgres.public.tjoin2 on tjoin1.c1 = 
> tjoin2.c1
> select tjoin1.rnum, tjoin1.c1, tjoin2.c1 as c1j2, tjoin2.c2 from 
> postgres.public.tjoin1, lateral ( select tjoin2.c1, tjoin2.c2 from 
> postgres.public.tjoin2 where tjoin1.c1=tjoin2.c1) tjoin2
> SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory 
> leaked: (40960)
> Allocator(op:0:0:3:JdbcSubScan) 100/40960/135168/100 
> (res/actual/peak/limit)
> create table TJOIN1 (RNUM integer   not null , C1 integer, C2 integer);
> insert into TJOIN1 (RNUM, C1, C2) values ( 0, 10, 15);
> insert into TJOIN1 (RNUM, C1, C2) values ( 1, 20, 25);
> insert into TJOIN1 (RNUM, C1, C2) values ( 2, NULL, 50);
> create table TJOIN2 (RNUM integer   not null , C1 integer, C2 char(2));
> insert into TJOIN2 (RNUM, C1, C2) values ( 0, 10, 'BB');
> insert into TJOIN2 (RNUM, C1, C2) values ( 1, 15, 'DD');
> insert into TJOIN2 (RNUM, C1, C2) values ( 2, NULL, 'EE');
> insert into TJOIN2 (RNUM, C1, C2) values ( 3, 10, 'FF');



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5208) Finding path to java executable should be deterministic

2017-02-24 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5208:

Reviewer: Arina Ielchiieva

Assigned Reviewer to [~arina]

> Finding path to java executable should be deterministic
> ---
>
> Key: DRILL-5208
> URL: https://issues.apache.org/jira/browse/DRILL-5208
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.10.0
>Reporter: Krystal
>Assignee: Paul Rogers
>Priority: Minor
>
> Command to find JAVA in drill-config.sh is not deterministic.  
> drill-config.sh uses the following command to find JAVA:
> JAVA=`find -L "$JAVA_HOME" -name $JAVA_BIN -type f | head -n 1`
> On one of my node the following command returned 2 entries:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> /usr/local/java/jdk1.7.0_67/bin/java
> On another node, the same command returned entries in different order:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/bin/java
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> The complete command picks the first one returned which may not be the same 
> on each node:
> find -L $JAVA_HOME -name java -type f | head -n 1
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> If JAVA_HOME is found, we should just append the "bin/java" to the path"
> JAVA=$JAVA_HOME/bin/java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5274) Exception thrown in Drillbit shutdown in UDF cleanup code

2017-02-24 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5274:

Reviewer: Paul Rogers

Assigned Reviewer to [~paul-rogers]

> Exception thrown in Drillbit shutdown in UDF cleanup code
> -
>
> Key: DRILL-5274
> URL: https://issues.apache.org/jira/browse/DRILL-5274
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
>
> I ran a very simple query: a single-line text file in an embedded Drillbit. 
> The UDF directory was placed in /tmp. During the run, the directory was 
> deleted. On Drillbit shutdown, the following occurred:
> {code}
> 25328 DEBUG [main] [org.apache.drill.exec.server.Drillbit] - Shutdown begun.
> 26344 INFO [pool-1-thread-2] [org.apache.drill.exec.rpc.data.DataServer] - 
> closed eventLoopGroup io.netty.channel.nio.NioEventLoopGroup@7d1c0d85 in 1007 
> ms
> 26345 INFO [pool-1-thread-1] [org.apache.drill.exec.rpc.user.UserServer] - 
> closed eventLoopGroup io.netty.channel.nio.NioEventLoopGroup@7cdb3b56 in 1008 
> ms
> 26345 INFO [pool-1-thread-1] [org.apache.drill.exec.service.ServiceEngine] - 
> closed userServer in 1009 ms
> 26345 INFO [pool-1-thread-2] [org.apache.drill.exec.service.ServiceEngine] - 
> closed dataPool in 1009 ms
> 26356 WARN [main] [org.apache.drill.exec.server.Drillbit] - Failure on close()
> java.lang.IllegalArgumentException: /tmp/drill/udf/udf/local does not exist
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637) 
> ~[commons-io-2.4.jar:2.4]
>   at 
> org.apache.drill.exec.expr.fn.FunctionImplementationRegistry.close(FunctionImplementationRegistry.java:469)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.server.DrillbitContext.close(DrillbitContext.java:209) 
> ~[classes/:na]
>   at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:152) 
> ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> ~[classes/:na]
>   at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:171) 
> ~[classes/:na]
> ...
> {code}
> The following patch makes the problem go away, but I'm not sure if the above 
> is an indication of deeper problems.
> {code}
> public class FunctionImplementationRegistry implements FunctionLookupContext, 
> AutoCloseable {
>   ...
>   public void close() {
> if (deleteTmpDir) {
>   ...
> } else {
>   try {
> File dir = new File(localUdfDir.toUri().getPath());
> if (dir.exists()) {
>   FileUtils.cleanDirectory(dir);
> }
>   ...
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5255) Unit tests fail due to CTTAS temporary name space checks

2017-02-24 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5255:

Reviewer: Paul Rogers

Assigned Reviewer to [~paul-rogers]

> Unit tests fail due to CTTAS temporary name space checks
> 
>
> Key: DRILL-5255
> URL: https://issues.apache.org/jira/browse/DRILL-5255
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
> Fix For: 1.10.0
>
>
> Drill can operate in embedded mode. In this mode, no storage plugin 
> definitions other than the defaults may be present. In particular, when using 
> the Drill test framework, only those storage plugins defined in the Drill 
> code are available.
> Yet, Drill checks for the existence of the dfs.tmp plugin definition (as 
> named by the {{drill.exec.default_temporary_workspace}} parameter. Because 
> this plugin is not defined, an exception occurs:
> {code}
> org.apache.drill.common.exceptions.UserException: PARSE ERROR: Unable to 
> create or drop tables/views. Schema [dfs.tmp] is immutable.
> [Error Id: 792d4e5d-3f31-4f38-8bb4-d108f1a808f6 ]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>   at 
> org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToMutableDrillSchema(SchemaUtilites.java:184)
>   at 
> org.apache.drill.exec.planner.sql.SchemaUtilites.getTemporaryWorkspace(SchemaUtilites.java:201)
>   at 
> org.apache.drill.exec.server.Drillbit.validateTemporaryWorkspace(Drillbit.java:264)
>   at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:135)
>   at 
> org.apache.drill.test.ClusterFixture.startDrillbits(ClusterFixture.java:207)
>   ...
> {code}
> Expected that either a configuration would exist that would use the default 
> /tmp/drill location, or that the check for {{drill.tmp}} would be deferred 
> until it is actually required (such as when executing a CTTAS statement.)
> It seemed that the test framework must be altered to work around this problem 
> by defining the necessary workspace. Unfortunately, the Drillbit must start 
> before we can define the workspace needed for the Drillbit to start. So, this 
> workaround is not possible.
> Further, users of the embedded Drillbit may not know to do this configuration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   4   5   >