[jira] [Commented] (DRILL-6231) Fix memory allocation for repeated list vector

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403220#comment-16403220
 ] 

ASF GitHub Bot commented on DRILL-6231:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1171#discussion_r175244575
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -395,11 +395,24 @@ private void allocateMap(AbstractMapVector map, int 
recordCount) {
   }
 }
 
+private void allocateRepeatedList(RepeatedListVector vector, int 
recordCount) {
+  vector.allocateOffsetsNew(recordCount);
+  recordCount *= getCardinality();
+  ColumnSize child = children.get(vector.getField().getName());
+  child.allocateVector(vector.getDataVector(), recordCount);
--- End diff --

One interesting feature of this vector is that the child can be null during 
reading for some time. That is, in JSON, we may see that the field is `foo: 
[[]]`, but don't know the inner type yet. So, for safety, allocate the inner 
vector only if `vector.getDataVector()` is non-null.

Also note that a repeated list can be of any dimension. So, the inner 
vector can be another repeated list of lesser dimension. The code here handles 
that case. But, does the sizer itself handle nested repeated lists? Do we have 
a unit test for a 2D and 3D list?

Never had to do these before because only JSON can produce such structures 
and we don't seem to exercise most operators with complex JSON structures. We 
probably should.


> Fix memory allocation for repeated list vector
> --
>
> Key: DRILL-6231
> URL: https://issues.apache.org/jira/browse/DRILL-6231
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Critical
> Fix For: 1.14.0
>
>
> Vector allocation in record batch sizer can be enhanced to allocate memory 
> for repeated list vector more accurately rather than using default functions. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6262) IndexOutOfBoundException in RecordBatchSize for empty variableWidthVector

2018-03-16 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6262:


 Summary: IndexOutOfBoundException in RecordBatchSize for empty 
variableWidthVector
 Key: DRILL-6262
 URL: https://issues.apache.org/jira/browse/DRILL-6262
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia
 Fix For: 1.14.0


ColumnSize inside RecordBatchSizer while computing the totalDataSize for 
VariableWidthVector throws IndexOutOfBoundException when the underlying vector 
is empty without any allocated memory.

This happens because the way totalDataSize is computed is using the 
offsetVector value at an index n where n is total number of records in the 
vector. When vector is empty then n=0 and offsetVector drillbuf is empty as 
well. So while retrieving value at index 0 from offsetVector exception is 
thrown. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6008) Unable to shutdown Drillbit using short domain name

2018-03-16 Thread Venkata Jyothsna Donapati (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403110#comment-16403110
 ] 

Venkata Jyothsna Donapati commented on DRILL-6008:
--

[~arina] With DRILL-6044 changes the post request (shutdown request) is done on 
the host from the url. So this issue will not occur. Moreover, I have tried to 
shutdown using short host name from Postman(Rest API client) and it works fine 
(This works only in case of ssl, auth disabled). All I did was add  
XX.XX.XX.XXX cv1 cv1.lab in /etc/hosts in my local from where I'm using the 
Postman to shutdown (POST on http://cv1:8047/gracefulShutdown)

 

> Unable to shutdown Drillbit using short domain name
> ---
>
> Key: DRILL-6008
> URL: https://issues.apache.org/jira/browse/DRILL-6008
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Attachments: fqdn.JPG, method_is_not_allowed.JPG, 
> response_of_undefined.JPG
>
>
> Could not shutdown drillbit on cluster where host name was used as drillbit's 
> address (fqdn.JPG). Pressing shutdown resulted in 
> (response_of_undefined.JPG). I have tried using ip address and also no luck 
> (method_is_not_allowed.JPG).
> I could shutdown drillbit in embeddded mode but then I saw the following 
> errors (local_shutdown.JPG): looks like Web UI was trying to get drillbit 
> status though it was down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6215) Use prepared statement instead of Statement in JdbcRecordReader class

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403090#comment-16403090
 ] 

ASF GitHub Bot commented on DRILL-6215:
---

Github user kfaraaz commented on the issue:

https://github.com/apache/drill/pull/1159
  
I don't know about the other file, I didn't add it. Let me check.

Thanks,
Khurram


> Use prepared statement instead of Statement in JdbcRecordReader class
> -
>
> Key: DRILL-6215
> URL: https://issues.apache.org/jira/browse/DRILL-6215
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.12.0
>Reporter: Khurram Faraaz
>Priority: Major
>
> Use prepared statement instead of Statement in JdbcRecordReader class, which 
> is more efficient and less vulnerable to SQL injection attacks.
> Apache Drill 1.13.0-SNAPSHOT, commit : 
> 9073aed67d89e8b2188870d6c812706085c9c41b
> Findbugs reports the below bug and suggests that we use prepared statement 
> instead of Statement.
> {noformat}
> In class org.apache.drill.exec.store.jdbc.JdbcRecordReader
> In method 
> org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup(OperatorContext, 
> OutputMutator)
> At JdbcRecordReader.java:[line 170]
> org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup(OperatorContext, 
> OutputMutator) passes a nonconstant String to an execute method on an SQL 
> statement
> The method invokes the execute method on an SQL statement with a String that 
> seems to be dynamically generated. 
> Consider using a prepared statement instead. 
> It is more efficient and less vulnerable to SQL injection attacks.
> {noformat}
> LOC - 
> https://github.com/apache/drill/blob/a9ea4ec1c5645ddab4b7aef9ac060ff5f109b696/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java#L170
> {noformat}
> To run with findbugs:
> mvn clean install -Pfindbugs -DskipTests
> Findbugs will wirite the output to finbugsXml.html in the target directory of 
> each module. 
> For example the java-exec module report is located at: 
> ./exec/java-exec/target/findbugs/findbugsXml.html
> Use 
> find . -name "findbugsXml.html"
> to locate the files.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-3640) Drill JDBC driver support Statement.setQueryTimeout(int)

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-3640:
-
Labels: doc-impacting ready-to-commit  (was: ready-to-commit)

> Drill JDBC driver support Statement.setQueryTimeout(int)
> 
>
> Key: DRILL-3640
> URL: https://issues.apache.org/jira/browse/DRILL-3640
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC
>Affects Versions: 1.2.0
>Reporter: Chun Chang
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
>
> It would be nice if we have this implemented. Run away queries can be 
> automatically canceled by setting the timeout. 
> java.sql.SQLFeatureNotSupportedException: Setting network timeout is not 
> supported.
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.setQueryTimeout(DrillStatementImpl.java:152)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6260) Query fails with "ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query

2018-03-16 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402565#comment-16402565
 ] 

Gautam Kumar Parai commented on DRILL-6260:
---

[~hanu.ncr] did not realize you had reassigned it to yourself. I analyzed it a 
bit - Drill throws the error when it `visits` the Calcite logical op tree and 
finds a LogicalAggregate which has SqlSingleValueAggFunction. However, we may 
need to change the visitor to keep going down the tree to find one which is 
not. The code is in PreProcessLogicalRel.java:visit(LogicalAggregate 
aggregate). Hope this helps!

> Query fails with "ERROR: Non-scalar sub-query used in an expression" when it 
> contains a cast expression around a scalar sub-query 
> --
>
> Key: DRILL-6260
> URL: https://issues.apache.org/jira/browse/DRILL-6260
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.13.0, 1.14.0
> Environment: git Commit ID: dd4a46a6c57425284a2b8c68676357f947e01988 
> git Commit Message: Update version to 1.14.0-SNAPSHOT
>Reporter: Abhishek Girish
>Assignee: Hanumath Rao Maduri
>Priority: Major
>
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > cast(max(T2.a) as varchar) FROM `t2.json` T2);
> Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
> See Apache Drill JIRA: DRILL-1937
> {code}
> Slightly different variants of the query work fine. 
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(cast(T2.a as varchar)) FROM `t2.json` T2);
> 00-00    Screen
> 00-01      Project(b=[$0])
> 00-02        Project(b=[$1])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, $2)])
> 00-05              NestedLoopJoin(condition=[true], joinType=[left])
> 00-07                Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
> "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
> 00-09                    Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(T2.a) FROM `t2.json` T2);
> 00-00Screen
> 00-01  Project(b=[$0])
> 00-02Project(b=[$1])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($0, $2)])
> 00-05  NestedLoopJoin(condition=[true], joinType=[left])
> 00-07Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08  Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]])
> {code}
> File contents:
> {code}
> # cat t1.json 
> {"a":1, "b":"V"}
> {"a":2, "b":"W"}
> {"a":3, "b":"X"}
> {"a":4, "b":"Y"}
> {"a":5, "b":"Z"}
> # cat t2.json 
> {"a":1, "b":"A"}
> {"a":2, "b":"B"}
> {"a":3, "b":"C"}
> {"a":4, "b":"D"}
> {"a":5, "b":"E"}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6261) logging "Waiting for X queries to complete before shutting down" even before shutdown request is triggered

2018-03-16 Thread Venkata Jyothsna Donapati (JIRA)
Venkata Jyothsna Donapati created DRILL-6261:


 Summary: logging "Waiting for X queries to complete before 
shutting down" even before shutdown request is triggered
 Key: DRILL-6261
 URL: https://issues.apache.org/jira/browse/DRILL-6261
 Project: Apache Drill
  Issue Type: Bug
Reporter: Venkata Jyothsna Donapati


After https://issues.apache.org/jira/browse/DRILL-5922 changes "Waiting for X 
queries to complete before shutting down" is logged every time a query runs 
instead of it being logged after a shutdown request is triggered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6245) Clicking on anything redirects to main login page

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6245:


Assignee: Venkata Jyothsna Donapati

> Clicking on anything redirects to main login page
> -
>
> Key: DRILL-6245
> URL: https://issues.apache.org/jira/browse/DRILL-6245
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.14.0
>
>
> When the Drill Web UI is accessed using https and then by http protocol, the 
> Web UI is always trying to redirect to main login page if anything is clicked 
> on index page. However, this works fine if the cookies are cleared.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6245) Clicking on anything redirects to main login page

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6245:
-
Fix Version/s: 1.14.0

> Clicking on anything redirects to main login page
> -
>
> Key: DRILL-6245
> URL: https://issues.apache.org/jira/browse/DRILL-6245
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.14.0
>
>
> When the Drill Web UI is accessed using https and then by http protocol, the 
> Web UI is always trying to redirect to main login page if anything is clicked 
> on index page. However, this works fine if the cookies are cleared.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4897) NumberFormatException in Drill SQL while casting to BIGINT when its actually a number

2018-03-16 Thread Karthikeyan Manivannan (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402463#comment-16402463
 ] 

Karthikeyan Manivannan commented on DRILL-4897:
---

This seems to be happening because the "WHEN 0 THEN 0" in the query. I think 
the "THEN 0" causes PROJECT to assume that the result column is INT instead of 
BIGINT and the query throws the exception when a number larger than what INT 
can hold is processed. The query runs fine if it is changed to "...WHEN 0 THEN 
2147483648..." but fails when it is changed to "...WHEN 0 THEN 2147483647..." 

0: jdbc:drill:zk=local> select CAST(case isnumeric(columns[0]) WHEN 0 THEN 
2147483647  ELSE columns[0] END AS BIGINT) from 
dfs.`/Users/karthik/work/bugs/DRILL-4897/pw2.csv`;

Error: SYSTEM ERROR: NumberFormatException: 2147483648 

Fragment 0:0

[Error Id: d29ec48e-e659-41b4-a722-9c546ef8c9c9 on 172.30.8.179:31010] 
(state=,code=0)

0: jdbc:drill:zk=local> select CAST(case isnumeric(columns[0]) WHEN 0 THEN 
2147483648  ELSE columns[0] END AS BIGINT) from 
dfs.`/Users/karthik/work/bugs/DRILL-4897/pw2.csv`;

+-+

|   EXPR$0    |

+-+

| 1           |

| 2           |

...

...

| 2147483648  |

| 4294967296  |

+-+

The planner seems to be doing the same thing in both cases:

Failed Case

< Project(EXPR$0=[CAST(CASE(=(ISNUMERIC(ITEM($0, 0)), 0), 2147483647, ITEM($0, 
0))):BIGINT]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 2.0, cumulative 
cost = \{4.0 rows, 10.0 cpu, 0.0 io, 0.0 network, 0.0 memory} 

---

Succesfull

> Project(EXPR$0=[CAST(CASE(=(ISNUMERIC(ITEM($0, 0)), 0), 2147483648, ITEM($0, 
> 0))):BIGINT]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 2.0, 
> cumulative cost = \{4.0 rows, 10.0 cpu, 0.0 io, 0.0 network, 0.0 memory}

So, I guess the problem is in the way the expression is handled in PROJECT. I 
will investigate this further.

 

> NumberFormatException in Drill SQL while casting to BIGINT when its actually 
> a number
> -
>
> Key: DRILL-4897
> URL: https://issues.apache.org/jira/browse/DRILL-4897
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Srihari Karanth
>Assignee: Karthikeyan Manivannan
>Priority: Blocker
>
> In the following SQL, drill cribs when trying to convert a number which is in 
> varchar
>select cast (case IsNumeric(Delta_Radio_Delay)  
> when 0 then 0 else Delta_Radio_Delay end as BIGINT) 
> from datasource.`./sometable` 
> where Delta_Radio_Delay='4294967294';
> BIGINT should be able to take very large number. I dont understand how it 
> throws the below error:
> 0: jdbc:drill:> select cast (case IsNumeric(Delta_Radio_Delay)  
> when 0 then 0 else Delta_Radio_Delay end as BIGINT) 
> from datasource.`./sometable` 
> where Delta_Radio_Delay='4294967294';
> Error: SYSTEM ERROR: NumberFormatException: 4294967294
> Fragment 1:29
> [Error Id: a63bb113-271f-4d8b-8194-2c9728543200 on cluster-3:31010] 
> (state=,code=0)
> How can i modify SQL to fix this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6227) Graceful shutdown should fail if unable to kill drillbit during some timeout rather then trying indefinitely

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6227:
-
Issue Type: Sub-task  (was: Improvement)
Parent: DRILL-6023

> Graceful shutdown should fail if unable to kill drillbit during some timeout 
> rather then trying indefinitely
> 
>
> Key: DRILL-6227
> URL: https://issues.apache.org/jira/browse/DRILL-6227
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
>
> In drillbit.sh graceful shutdown calls drillbit stop with {{kill_drillbit}} 
> set to false.
> {code}
> kill_drillbit=false
> stop_bit $kill_drillbit
> {code}
> It means that {{waitForProcessEnd}} will be called with the same property. 
> When {{waitForProcessEnd}} is called with {{kill_drillbit}} set to false, 
> this method will try to kill drillbit using {{kill -0}} until succeeds. So if 
> at some point it won't be able, it may run forever. Need to have some timeout 
> when {{waitForProcessEnd}} will stop trying to kill drillbit and report an 
> error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6024) Use grace period only in production servers

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6024:
-
Issue Type: Sub-task  (was: Improvement)
Parent: DRILL-6023

> Use grace period only in production servers
> ---
>
> Key: DRILL-6024
> URL: https://issues.apache.org/jira/browse/DRILL-6024
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>
> DRILL-4286 introduces graceful shutdown. Currently by default it is turned 
> out (grace period is set to 0) since if we turn it out by default it affects 
> non-productions systems, for example, unit tests time run increased x3 times.
> [~Paul.Rogers] proposed the following solution: 
> {quote}
> In a production system, we do want the grace period; it is an essential part 
> of the graceful shutdown procedure.
> However, if we are doing a non-graceful shutdown, the grace is unneeded.
> Also, if the cluster contains only one node (as in most unit tests), there is 
> nothing to wait for, so the grace period is not needed. The same is true in 
> an embedded Drillbit for Sqlline.
> So, can we provide a solution that handles these cases rather than simply 
> turning off the grace period always?
> If using the local cluster coordinator, say, then no grace is needed. If 
> using ZK, but there is only one Drillbit, no grace is needed. (There is a 
> race condition, but may be OK.)
> Or, if we detect we are embedded, no grace period.
> Then, also, if we are doing a graceful shutdown, we need the grace. But, if 
> we are doing a "classic" shutdown, no grace is needed.
> The result should be that the grace period is used only in production 
> servers, only when doing a graceful shutdown.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6022) Improve js part for graceful shutdown

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6022:
-
Issue Type: Sub-task  (was: Improvement)
Parent: DRILL-6023

> Improve js part for graceful shutdown
> -
>
> Key: DRILL-6022
> URL: https://issues.apache.org/jira/browse/DRILL-6022
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
>
> DRILL-4286 introduces graceful shutdown but its js part needs improvement:
> a. ajax call do not handle errors, so when error occurs it is just swallowed.
> b. there are some unused and / or unnecessary variables usage
> c. shutdown functionality is disabled when user is not an admin but some 
> other ajax calls are still being executed, for example, port number, number 
> of queries, grace period. All that can be also can be disabled when user is 
> not an admin.
> d. there are many ajax calls which can be factored out in dedicated js file.
> Other fixes:
> a. all shutdown functionality reside in DrillRoot class, it can be factored 
> out in shutdown specific class where all shutdown functionality will be 
> allowed only for admin on class level, currently we marked in on the level 
> (see DRILL-6019).
> b. issue described in DRILL-6021.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-2656) Add ability to specify options for clean shutdown of a Drillbit

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker resolved DRILL-2656.
--
Resolution: Duplicate

Addressed by DRILL-4286

> Add ability to specify options for clean shutdown of a Drillbit
> ---
>
> Key: DRILL-2656
> URL: https://issues.apache.org/jira/browse/DRILL-2656
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Flow
>Affects Versions: 0.8.0
>Reporter: Chris Westin
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: Future
>
>
> When we shut down a Drillbit, we should provide some options similar to those 
> available from Oracle's shutdown command (see 
> https://docs.oracle.com/cd/B28359_01/server.111/b28310/start003.htm#ADMIN11156)
>  .
> At present, in order to avoid problems like DRILL-2654, we try to do a short 
> wait for executing queries, but that times out after 5 seconds, and doesn't 
> help with long-running queries.
> Someone that is running a long query might be unhappy about losing work for 
> something that was near completion, so we can do better.
> And, in order to avoid spurious cleanup problems and exceptions, we should 
> explicitly cancel any remaining queries before we do complete the shutdown.
> As in the Oracle example, we might have shutdown immediate issue 
> cancellations to the running queries.  A clean shutdown might not have a 
> timeout, or might allow the specification of a longer timeout, and even when 
> the timeout goes off, we should still cleanly cancel any remaining queries, 
> and wait for the cancellations to complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4829) Configure the address to bind to

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-4829:
-
Fix Version/s: 1.14.0

> Configure the address to bind to
> 
>
> Key: DRILL-4829
> URL: https://issues.apache.org/jira/browse/DRILL-4829
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Daniel Stockton
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.14.0
>
>
> 1.7 included the following patch to prevent Drillbits binding to the loopback 
> address: https://issues.apache.org/jira/browse/DRILL-4523
> "Drillbit is disallowed to bind to loopback address in distributed mode."
> It would be better if this was configurable rather than rely on /etc/hosts, 
> since it's common for the hostname to resolve to loopback.
> Would you accept a patch that adds this option to drill.override.conf?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4829) Configure the address to bind to

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-4829:
-
Fix Version/s: (was: 1.14.0)

> Configure the address to bind to
> 
>
> Key: DRILL-4829
> URL: https://issues.apache.org/jira/browse/DRILL-4829
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Daniel Stockton
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.14.0
>
>
> 1.7 included the following patch to prevent Drillbits binding to the loopback 
> address: https://issues.apache.org/jira/browse/DRILL-4523
> "Drillbit is disallowed to bind to loopback address in distributed mode."
> It would be better if this was configurable rather than rely on /etc/hosts, 
> since it's common for the hostname to resolve to loopback.
> Would you accept a patch that adds this option to drill.override.conf?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-4829) Configure the address to bind to

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker resolved DRILL-4829.
--
Resolution: Duplicate

Addressed by DRILL-6005

> Configure the address to bind to
> 
>
> Key: DRILL-4829
> URL: https://issues.apache.org/jira/browse/DRILL-4829
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Daniel Stockton
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.14.0
>
>
> 1.7 included the following patch to prevent Drillbits binding to the loopback 
> address: https://issues.apache.org/jira/browse/DRILL-4523
> "Drillbit is disallowed to bind to loopback address in distributed mode."
> It would be better if this was configurable rather than rely on /etc/hosts, 
> since it's common for the hostname to resolve to loopback.
> Would you accept a patch that adds this option to drill.override.conf?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6243) Alert box to confirm shutdown of drillbit after clicking shutdown button

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6243:
-
Reviewer: Sorabh Hamirwasia

> Alert box to confirm shutdown of drillbit after clicking shutdown button 
> -
>
> Key: DRILL-6243
> URL: https://issues.apache.org/jira/browse/DRILL-6243
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6243) Alert box to confirm shutdown of drillbit after clicking shutdown button

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6243:
-
Fix Version/s: 1.14.0

> Alert box to confirm shutdown of drillbit after clicking shutdown button 
> -
>
> Key: DRILL-6243
> URL: https://issues.apache.org/jira/browse/DRILL-6243
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Minor
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6039) drillbit.sh graceful_stop does not wait for fragments to complete before stopping the drillbit

2018-03-16 Thread Pritesh Maker (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402447#comment-16402447
 ] 

Pritesh Maker commented on DRILL-6039:
--

This should be tested after DRIL-6252 is addressed.

> drillbit.sh graceful_stop does not wait for fragments to complete before 
> stopping the drillbit
> --
>
> Key: DRILL-6039
> URL: https://issues.apache.org/jira/browse/DRILL-6039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.14.0
>
>
> git.commit.id.abbrev=eb0c403
> I have 3-nodes cluster with drillbits running on each node.  I kicked off a 
> long running query.  In the middle of the query, I did a "./drillbit.sh 
> graceful_stop" on one of the non-foreman node.  The node was stopped within a 
> few seconds and the query failed with error:
> Error: SYSTEM ERROR: IOException: Filesystem closed
> Fragment 4:15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6008) Unable to shutdown Drillbit using short domain name

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6008:
-
Summary: Unable to shutdown Drillbit using short domain name  (was: Unable 
to shutdown Drillbit)

> Unable to shutdown Drillbit using short domain name
> ---
>
> Key: DRILL-6008
> URL: https://issues.apache.org/jira/browse/DRILL-6008
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Attachments: fqdn.JPG, method_is_not_allowed.JPG, 
> response_of_undefined.JPG
>
>
> Could not shutdown drillbit on cluster where host name was used as drillbit's 
> address (fqdn.JPG). Pressing shutdown resulted in 
> (response_of_undefined.JPG). I have tried using ip address and also no luck 
> (method_is_not_allowed.JPG).
> I could shutdown drillbit in embeddded mode but then I saw the following 
> errors (local_shutdown.JPG): looks like Web UI was trying to get drillbit 
> status though it was down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-143) Support CGROUPs resource management

2018-03-16 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua reassigned DRILL-143:
--

Assignee: Kunal Khatua

> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-143) Support CGROUPs resource management

2018-03-16 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402385#comment-16402385
 ] 

Kunal Khatua commented on DRILL-143:


Marking this as a feature for 1.14.0 since Drill-on-Yarn will be part of 1.13.0.

However, this would be a generic feature for Drill honoring CGroups 
irrespective of whether the node is managed by YARN or not.

> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
> Fix For: 1.14.0
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-3928) OutOfMemoryException should not be derived from FragmentSetupException

2018-03-16 Thread Karthikeyan Manivannan (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthikeyan Manivannan resolved DRILL-3928.
---
Resolution: Not A Problem

OutofMemoryException is not derived from FragmentSetupException

> OutOfMemoryException should not be derived from FragmentSetupException
> --
>
> Key: DRILL-3928
> URL: https://issues.apache.org/jira/browse/DRILL-3928
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Chris Westin
>Assignee: Karthikeyan Manivannan
>Priority: Major
>
> Discovered while working on DRILL-3927.
> The client and server both use the same direct memory allocator code. But the 
> allocator's OutOfMemoryException is derived from FragmentSetupException 
> (which is derived from ForemanException).
> Firstly, OOM situations don't only happen during setup.
> Secondly, Fragment and Foreman classes shouldn't exist on the client side. 
> (This is causing unnecessary dependencies on the jdbc-all jar on server-only 
> code).
> There's nothing special in those base classes that OutOfMemoryException 
> depends on. This looks like it was just a cheap way to avoid extra catch 
> clauses in Foreman and FragmentExecutor by catching the baser classes only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-143) Support CGROUPs resource management

2018-03-16 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-143:
---
Fix Version/s: (was: Future)
   1.14.0

> Support CGROUPs resource management
> ---
>
> Key: DRILL-143
> URL: https://issues.apache.org/jira/browse/DRILL-143
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jacques Nadeau
>Priority: Major
> Fix For: 1.14.0
>
>
> For the purpose of playing nice on clusters that don't have YARN, we should 
> write up configuration and scripts to allows users to run Drill next to 
> existing workloads without sharing resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6259) Support parquet filter push down for complex types

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402375#comment-16402375
 ] 

ASF GitHub Bot commented on DRILL-6259:
---

Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1173
  
@parthchandra can you please review this change?


> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. 
> Statistics for the second case won't show any nulls but when querying from 
> two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6259) Support parquet filter push down for complex types

2018-03-16 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6259:
-
Reviewer: Parth Chandra

> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. 
> Statistics for the second case won't show any nulls but when querying from 
> two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6252) Foreman node is going down when the non foreman node is stopped

2018-03-16 Thread Venkata Jyothsna Donapati (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Jyothsna Donapati updated DRILL-6252:
-
Attachment: foreman_drillbit.log

> Foreman node is going down when the non foreman node is stopped
> ---
>
> Key: DRILL-6252
> URL: https://issues.apache.org/jira/browse/DRILL-6252
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: foreman_drillbit.log, nonforeman_drillbit.log
>
>
> Two drillbits are running. I'm running a join query over parquet and tried to 
> stop the non-foreman node using drillbit.sh stop. The query fails with 
> *"Error: DATA_READ ERROR: Exception occurred while reading from disk".* The 
> non-foreman node goes down. The foreman node also goes down. When I looked at 
> the drillbit.log of both foreman and non-foreman I found that there is memory 
> leak  "Memory was leaked by query. Memory leaked: 
> (2097152)\nAllocator(op:2:0:0:HashPartitionSender) 
> 100/6291456/6832128/100 (res/actual/peak/limit)\n". Following are 
> the stack traces for memory leaks 
> {noformat} 
> [Error Id: 0d9a2799-7e97-46b3-953b-1f8d0dd87a04 on qa102-34.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (3145728)
> Allocator(op:2:1:0:HashPartitionSender) 100/6291456/6291456/100 
> (res/actual/peak/limit)
>  
>  
> Fragment 2:1 
> [Error Id: 0d9a2799-7e97-46b3-953b-1f8d0dd87a04 on qa102-34.qa.lab:31010]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:297)
>  [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266)
>  [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Memory was leaked by query. 
> Memory leaked: (3145728)
> Allocator(op:2:1:0:HashPartitionSender) 100/6291456/6291456/100 
> (res/actual/peak/limit)
> {noformat} 
>  
> Ping me for the logs and more information.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6252) Foreman node is going down when the non foreman node is stopped

2018-03-16 Thread Venkata Jyothsna Donapati (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Jyothsna Donapati updated DRILL-6252:
-
Attachment: nonforeman_drillbit.log

> Foreman node is going down when the non foreman node is stopped
> ---
>
> Key: DRILL-6252
> URL: https://issues.apache.org/jira/browse/DRILL-6252
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: foreman_drillbit.log, nonforeman_drillbit.log
>
>
> Two drillbits are running. I'm running a join query over parquet and tried to 
> stop the non-foreman node using drillbit.sh stop. The query fails with 
> *"Error: DATA_READ ERROR: Exception occurred while reading from disk".* The 
> non-foreman node goes down. The foreman node also goes down. When I looked at 
> the drillbit.log of both foreman and non-foreman I found that there is memory 
> leak  "Memory was leaked by query. Memory leaked: 
> (2097152)\nAllocator(op:2:0:0:HashPartitionSender) 
> 100/6291456/6832128/100 (res/actual/peak/limit)\n". Following are 
> the stack traces for memory leaks 
> {noformat} 
> [Error Id: 0d9a2799-7e97-46b3-953b-1f8d0dd87a04 on qa102-34.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (3145728)
> Allocator(op:2:1:0:HashPartitionSender) 100/6291456/6291456/100 
> (res/actual/peak/limit)
>  
>  
> Fragment 2:1 
> [Error Id: 0d9a2799-7e97-46b3-953b-1f8d0dd87a04 on qa102-34.qa.lab:31010]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:297)
>  [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266)
>  [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Memory was leaked by query. 
> Memory leaked: (3145728)
> Allocator(op:2:1:0:HashPartitionSender) 100/6291456/6291456/100 
> (res/actual/peak/limit)
> {noformat} 
>  
> Ping me for the logs and more information.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6223) Drill fails on Schema changes

2018-03-16 Thread salim achouche (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402255#comment-16402255
 ] 

salim achouche commented on DRILL-6223:
---

Parth, this PR is not Parquet specific as it deals with downstream operators 
having issues handling schema changes. Most of the time, the end result would 
be downstream operators trying to access stale data.

 

> Drill fails on Schema changes 
> --
>
> Key: DRILL-6223
> URL: https://issues.apache.org/jira/browse/DRILL-6223
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data 
> File (Parquet) Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within 
> nested data types
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor 
> fragments are involved (concurrency higher than one)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6223) Drill fails on Schema changes

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402228#comment-16402228
 ] 

ASF GitHub Bot commented on DRILL-6223:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1170
  
I added a comment in the JIRA - 
[DRILL-6223](https://issues.apache.org/jira/browse/DRILL-6223?focusedCommentId=16402223=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16402223)



> Drill fails on Schema changes 
> --
>
> Key: DRILL-6223
> URL: https://issues.apache.org/jira/browse/DRILL-6223
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data 
> File (Parquet) Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within 
> nested data types
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor 
> fragments are involved (concurrency higher than one)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6223) Drill fails on Schema changes

2018-03-16 Thread Parth Chandra (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402223#comment-16402223
 ] 

Parth Chandra commented on DRILL-6223:
--

Schema change for Parquet files is not supported by the Parquet metadata cache. 
The Parquet metadata cache overwrites the schema if it changes (does not merge) 
and so the last one encountered is the schema selected. New columns added are 
OK, I think, but type changes are not.

See [1].

I haven't looked at the PR, but you might want to test this out with the 
metadata cache enabled.

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L420

> Drill fails on Schema changes 
> --
>
> Key: DRILL-6223
> URL: https://issues.apache.org/jira/browse/DRILL-6223
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data 
> File (Parquet) Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within 
> nested data types
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor 
> fragments are involved (concurrency higher than one)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6260) Query fails with "ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query

2018-03-16 Thread Hanumath Rao Maduri (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri reassigned DRILL-6260:
--

Assignee: Hanumath Rao Maduri

> Query fails with "ERROR: Non-scalar sub-query used in an expression" when it 
> contains a cast expression around a scalar sub-query 
> --
>
> Key: DRILL-6260
> URL: https://issues.apache.org/jira/browse/DRILL-6260
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.13.0, 1.14.0
> Environment: git Commit ID: dd4a46a6c57425284a2b8c68676357f947e01988 
> git Commit Message: Update version to 1.14.0-SNAPSHOT
>Reporter: Abhishek Girish
>Assignee: Hanumath Rao Maduri
>Priority: Major
>
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > cast(max(T2.a) as varchar) FROM `t2.json` T2);
> Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
> See Apache Drill JIRA: DRILL-1937
> {code}
> Slightly different variants of the query work fine. 
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(cast(T2.a as varchar)) FROM `t2.json` T2);
> 00-00    Screen
> 00-01      Project(b=[$0])
> 00-02        Project(b=[$1])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, $2)])
> 00-05              NestedLoopJoin(condition=[true], joinType=[left])
> 00-07                Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
> "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
> 00-09                    Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(T2.a) FROM `t2.json` T2);
> 00-00Screen
> 00-01  Project(b=[$0])
> 00-02Project(b=[$1])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($0, $2)])
> 00-05  NestedLoopJoin(condition=[true], joinType=[left])
> 00-07Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08  Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]])
> {code}
> File contents:
> {code}
> # cat t1.json 
> {"a":1, "b":"V"}
> {"a":2, "b":"W"}
> {"a":3, "b":"X"}
> {"a":4, "b":"Y"}
> {"a":5, "b":"Z"}
> # cat t2.json 
> {"a":1, "b":"A"}
> {"a":2, "b":"B"}
> {"a":3, "b":"C"}
> {"a":4, "b":"D"}
> {"a":5, "b":"E"}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6260) Query fails with "ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query

2018-03-16 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-6260:
---
Description: 
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> cast(max(T2.a) as varchar) FROM `t2.json` T2);

Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
See Apache Drill JIRA: DRILL-1937
{code}

Slightly different variants of the query work fine. 
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> max(cast(T2.a as varchar)) FROM `t2.json` T2);

00-00    Screen
00-01      Project(b=[$0])
00-02        Project(b=[$1])
00-03          SelectionVectorRemover
00-04            Filter(condition=[=($0, $2)])
00-05              NestedLoopJoin(condition=[true], joinType=[left])
00-07                Scan(table=[[si, tmp, t1.json]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/t1.json, numFiles=1, columns=[`a`, `b`], 
files=[maprfs:///tmp/t1.json]]])
00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
"UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
00-09                    Scan(table=[[si, tmp, t2.json]], 
groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> max(T2.a) FROM `t2.json` T2);

00-00Screen
00-01  Project(b=[$0])
00-02Project(b=[$1])
00-03  SelectionVectorRemover
00-04Filter(condition=[=($0, $2)])
00-05  NestedLoopJoin(condition=[true], joinType=[left])
00-07Scan(table=[[si, tmp, t1.json]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/t1.json, numFiles=1, columns=[`a`, `b`], 
files=[maprfs:///tmp/t1.json]]])
00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
00-08  Scan(table=[[si, tmp, t2.json]], 
groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
columns=[`a`], files=[maprfs:///tmp/t2.json]]])
{code}

File contents:
{code}
# cat t1.json 
{"a":1, "b":"V"}
{"a":2, "b":"W"}
{"a":3, "b":"X"}
{"a":4, "b":"Y"}
{"a":5, "b":"Z"}

# cat t2.json 
{"a":1, "b":"A"}
{"a":2, "b":"B"}
{"a":3, "b":"C"}
{"a":4, "b":"D"}
{"a":5, "b":"E"}
{code}

  was:
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> cast(max(T2.a) as varchar) FROM `t2.json` T2);

Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
See Apache Drill JIRA: DRILL-1937
{code}

Slightly different variants of the query work fine. 
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> max(cast(T2.a as varchar)) FROM `t2.json` T2);

+--+--+
| text | json |
+--+--+
| 00-00    Screen
00-01      Project(b=[$0])
00-02        Project(b=[$1])
00-03          SelectionVectorRemover
00-04            Filter(condition=[=($0, $2)])
00-05              NestedLoopJoin(condition=[true], joinType=[left])
00-07                Scan(table=[[si, tmp, t1.json]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/t1.json, numFiles=1, columns=[`a`, `b`], 
files=[maprfs:///tmp/t1.json]]])
00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
"UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
00-09                    Scan(table=[[si, tmp, t2.json]], 
groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> max(T2.a) FROM `t2.json` T2);

+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(b=[$0])
00-02Project(b=[$1])
00-03  SelectionVectorRemover
00-04Filter(condition=[=($0, $2)])
00-05  NestedLoopJoin(condition=[true], joinType=[left])
00-07Scan(table=[[si, tmp, t1.json]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/t1.json, numFiles=1, columns=[`a`, `b`], 
files=[maprfs:///tmp/t1.json]]])
00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
00-08  Scan(table=[[si, tmp, t2.json]], 
groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
columns=[`a`], files=[maprfs:///tmp/t2.json]]])
{code}

File contents:
{code}
# cat t1.json 
{"a":1, "b":"V"}
{"a":2, "b":"W"}
{"a":3, "b":"X"}
{"a":4, "b":"Y"}
{"a":5, "b":"Z"}

# cat t2.json 
{"a":1, "b":"A"}
{"a":2, "b":"B"}
{"a":3, "b":"C"}
{"a":4, "b":"D"}
{"a":5, "b":"E"}
{code}


> Query fails with "ERROR: Non-scalar sub-query used in an expression" when it 
> contains a cast expression around a scalar sub-query 
> 

[jira] [Updated] (DRILL-6260) Query fails with "ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query

2018-03-16 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-6260:
---
Description: 
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> cast(max(T2.a) as varchar) FROM `t2.json` T2);

Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
See Apache Drill JIRA: DRILL-1937
{code}

Slightly different variants of the query work fine. 
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> max(cast(T2.a as varchar)) FROM `t2.json` T2);

+--+--+
| text | json |
+--+--+
| 00-00    Screen
00-01      Project(b=[$0])
00-02        Project(b=[$1])
00-03          SelectionVectorRemover
00-04            Filter(condition=[=($0, $2)])
00-05              NestedLoopJoin(condition=[true], joinType=[left])
00-07                Scan(table=[[si, tmp, t1.json]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/t1.json, numFiles=1, columns=[`a`, `b`], 
files=[maprfs:///tmp/t1.json]]])
00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
"UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
00-09                    Scan(table=[[si, tmp, t2.json]], 
groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> max(T2.a) FROM `t2.json` T2);

+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(b=[$0])
00-02Project(b=[$1])
00-03  SelectionVectorRemover
00-04Filter(condition=[=($0, $2)])
00-05  NestedLoopJoin(condition=[true], joinType=[left])
00-07Scan(table=[[si, tmp, t1.json]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/t1.json, numFiles=1, columns=[`a`, `b`], 
files=[maprfs:///tmp/t1.json]]])
00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
00-08  Scan(table=[[si, tmp, t2.json]], 
groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
columns=[`a`], files=[maprfs:///tmp/t2.json]]])
{code}

File contents:
{code}
# cat t1.json 
{"a":1, "b":"V"}
{"a":2, "b":"W"}
{"a":3, "b":"X"}
{"a":4, "b":"Y"}
{"a":5, "b":"Z"}

# cat t2.json 
{"a":1, "b":"A"}
{"a":2, "b":"B"}
{"a":3, "b":"C"}
{"a":4, "b":"D"}
{"a":5, "b":"E"}
{code}

  was:
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> cast(max(T2.a) as varchar) FROM `t2.json` T2);

Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
See Apache Drill JIRA: DRILL-1937
{code}

Slightly different variants of the query work fine. 
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> max(cast(T2.a as varchar)) FROM `t2.json` T2);

+--+--+
| text | json |
+--+--+
| 00-00    Screen
00-01      Project(b=[$0])
00-02        Project(b=[$1])
00-03          SelectionVectorRemover
00-04            Filter(condition=[=($0, $2)])
00-05              NestedLoopJoin(condition=[true], joinType=[left])
00-07                Scan(table=[[si, tmp, t1.json]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/t1.json, numFiles=1, columns=[`a`, `b`], 
files=[maprfs:///tmp/t1.json]]])
00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
"UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
00-09                    Scan(table=[[si, tmp, t2.json]], 
groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> max(T2.a) FROM `t2.json` T2);

+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(b=[$0])
00-02Project(b=[$1])
00-03  SelectionVectorRemover
00-04Filter(condition=[=($0, $2)])
00-05  NestedLoopJoin(condition=[true], joinType=[left])
00-07Scan(table=[[si, tmp, t1.json]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/t1.json, numFiles=1, columns=[`a`, `b`], 
files=[maprfs:///tmp/t1.json]]])
00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
00-08  Scan(table=[[si, tmp, t2.json]], 
groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
columns=[`a`], files=[maprfs:///tmp/t2.json]]])
{code}

File contents:
{code}
# cat t1.json 
{"a":1, "b":"V"}
{"a":2, "b":"W"}
{"a":3, "b":"X"}
{"a":4, "b":"Y"}
{"a":5, "b":"Z"}

# # cat t2.json 
{"a":1, "b":"A"}
{"a":2, "b":"B"}
{"a":3, "b":"C"}
{"a":4, "b":"D"}
{"a":5, "b":"E"}
{code}


> Query fails with "ERROR: Non-scalar sub-query used in an expression" when it 
> contains a cast expression around a scalar sub-query 
> 

[jira] [Comment Edited] (DRILL-6259) Support parquet filter push down for complex types

2018-03-16 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402147#comment-16402147
 ] 

Arina Ielchiieva edited comment on DRILL-6259 at 3/16/18 4:32 PM:
--

Support for complex types but not for scalar complex types.
Example of supported types in parquet schema:
{noformat}
message complex_users {
  required group user {
required int32 id;
optional int32 age;
repeated int32 hobby_ids;
optional boolean active;
  }
}
{noformat}
This is simple one, it can be nested as well.


was (Author: arina):
Support for complex types but not for scalar complex type.
Example of supported types in parquet schema:
{noformat}
message complex_users {
  required group user {
required int32 id;
optional int32 age;
repeated int32 hobby_ids;
optional boolean active;
  }
}
{noformat}
This is simple one, it can be nested as well.

> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. 
> Statistics for the second case won't show any nulls but when querying from 
> two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6259) Support parquet filter push down for complex types

2018-03-16 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402147#comment-16402147
 ] 

Arina Ielchiieva commented on DRILL-6259:
-

Support for complex types but not for scalar complex type.
Example of supported types in parquet schema:
{noformat}
message complex_users {
  required group user {
required int32 id;
optional int32 age;
repeated int32 hobby_ids;
optional boolean active;
  }
}
{noformat}
This is simple one, it can be nested as well.

> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. 
> Statistics for the second case won't show any nulls but when querying from 
> two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6260) Query fails with "ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query

2018-03-16 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-6260:
---
Affects Version/s: 1.14.0
  Environment: 
git Commit ID: dd4a46a6c57425284a2b8c68676357f947e01988 
git Commit message: Update version to 1.14.0-SNAPSHOT

> Query fails with "ERROR: Non-scalar sub-query used in an expression" when it 
> contains a cast expression around a scalar sub-query 
> --
>
> Key: DRILL-6260
> URL: https://issues.apache.org/jira/browse/DRILL-6260
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.13.0, 1.14.0
> Environment: git Commit ID: dd4a46a6c57425284a2b8c68676357f947e01988 
> git Commit message: Update version to 1.14.0-SNAPSHOT
>Reporter: Abhishek Girish
>Priority: Major
>
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > cast(max(T2.a) as varchar) FROM `t2.json` T2);
> Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
> See Apache Drill JIRA: DRILL-1937
> {code}
> Slightly different variants of the query work fine. 
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(cast(T2.a as varchar)) FROM `t2.json` T2);
> +--+--+
> | text | json |
> +--+--+
> | 00-00    Screen
> 00-01      Project(b=[$0])
> 00-02        Project(b=[$1])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, $2)])
> 00-05              NestedLoopJoin(condition=[true], joinType=[left])
> 00-07                Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
> "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
> 00-09                    Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(T2.a) FROM `t2.json` T2);
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(b=[$0])
> 00-02Project(b=[$1])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($0, $2)])
> 00-05  NestedLoopJoin(condition=[true], joinType=[left])
> 00-07Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08  Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]])
> {code}
> File contents:
> {code}
> # cat t1.json 
> {"a":1, "b":"V"}
> {"a":2, "b":"W"}
> {"a":3, "b":"X"}
> {"a":4, "b":"Y"}
> {"a":5, "b":"Z"}
> # # cat t2.json 
> {"a":1, "b":"A"}
> {"a":2, "b":"B"}
> {"a":3, "b":"C"}
> {"a":4, "b":"D"}
> {"a":5, "b":"E"}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6260) Query fails with "ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query

2018-03-16 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-6260:
---
Environment: 
git Commit ID: dd4a46a6c57425284a2b8c68676357f947e01988 
git Commit Message: Update version to 1.14.0-SNAPSHOT

  was:
git Commit ID: dd4a46a6c57425284a2b8c68676357f947e01988 
git Commit message: Update version to 1.14.0-SNAPSHOT


> Query fails with "ERROR: Non-scalar sub-query used in an expression" when it 
> contains a cast expression around a scalar sub-query 
> --
>
> Key: DRILL-6260
> URL: https://issues.apache.org/jira/browse/DRILL-6260
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.13.0, 1.14.0
> Environment: git Commit ID: dd4a46a6c57425284a2b8c68676357f947e01988 
> git Commit Message: Update version to 1.14.0-SNAPSHOT
>Reporter: Abhishek Girish
>Priority: Major
>
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > cast(max(T2.a) as varchar) FROM `t2.json` T2);
> Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
> See Apache Drill JIRA: DRILL-1937
> {code}
> Slightly different variants of the query work fine. 
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(cast(T2.a as varchar)) FROM `t2.json` T2);
> +--+--+
> | text | json |
> +--+--+
> | 00-00    Screen
> 00-01      Project(b=[$0])
> 00-02        Project(b=[$1])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, $2)])
> 00-05              NestedLoopJoin(condition=[true], joinType=[left])
> 00-07                Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
> "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
> 00-09                    Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(T2.a) FROM `t2.json` T2);
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(b=[$0])
> 00-02Project(b=[$1])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($0, $2)])
> 00-05  NestedLoopJoin(condition=[true], joinType=[left])
> 00-07Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08  Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]])
> {code}
> File contents:
> {code}
> # cat t1.json 
> {"a":1, "b":"V"}
> {"a":2, "b":"W"}
> {"a":3, "b":"X"}
> {"a":4, "b":"Y"}
> {"a":5, "b":"Z"}
> # # cat t2.json 
> {"a":1, "b":"A"}
> {"a":2, "b":"B"}
> {"a":3, "b":"C"}
> {"a":4, "b":"D"}
> {"a":5, "b":"E"}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6259) Support parquet filter push down for complex types

2018-03-16 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402120#comment-16402120
 ] 

Paul Rogers commented on DRILL-6259:


What is meant when we say "complex type"? Drill has multiple "complex" types:
 * Arrays
 * Nested tuples (AKA "maps")
 * Arrays of nested tuples
 * Multi-dimensional arrays (AKA "repeated lists")
 * Hetrogenous values (AKA "unions")
 * Hetrogenous lists (AKA "non-repeated lists")
 * Combinations of the above (a repeated map that contains a union that 
contains a 2D list of maps)

Then there are the "complex" scalar types (complex because they are not simply 
bit values like an int or a float):
 * Decimal
 * Date/time
 * Date
 * Time
 * Period

The write-up mentions arrays. Is this only for arrays? Also for maps? For map 
arrays?

Please identify which complex types are now supported.

> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. 
> Statistics for the second case won't show any nulls but when querying from 
> two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6260) Query fails with "ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query

2018-03-16 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-6260:
---
Summary: Query fails with "ERROR: Non-scalar sub-query used in an 
expression" when it contains a cast expression around a scalar sub-query   
(was: Query fails with "UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used 
in an expression" when it contains a cast expression around a scalar sub-query )

> Query fails with "ERROR: Non-scalar sub-query used in an expression" when it 
> contains a cast expression around a scalar sub-query 
> --
>
> Key: DRILL-6260
> URL: https://issues.apache.org/jira/browse/DRILL-6260
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.13.0
>Reporter: Abhishek Girish
>Priority: Major
>
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > cast(max(T2.a) as varchar) FROM `t2.json` T2);
> Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
> See Apache Drill JIRA: DRILL-1937
> {code}
> Slightly different variants of the query work fine. 
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(cast(T2.a as varchar)) FROM `t2.json` T2);
> +--+--+
> | text | json |
> +--+--+
> | 00-00    Screen
> 00-01      Project(b=[$0])
> 00-02        Project(b=[$1])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, $2)])
> 00-05              NestedLoopJoin(condition=[true], joinType=[left])
> 00-07                Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
> "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
> 00-09                    Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(T2.a) FROM `t2.json` T2);
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(b=[$0])
> 00-02Project(b=[$1])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($0, $2)])
> 00-05  NestedLoopJoin(condition=[true], joinType=[left])
> 00-07Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08  Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]])
> {code}
> File contents:
> {code}
> # cat t1.json 
> {"a":1, "b":"V"}
> {"a":2, "b":"W"}
> {"a":3, "b":"X"}
> {"a":4, "b":"Y"}
> {"a":5, "b":"Z"}
> # # cat t2.json 
> {"a":1, "b":"A"}
> {"a":2, "b":"B"}
> {"a":3, "b":"C"}
> {"a":4, "b":"D"}
> {"a":5, "b":"E"}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6260) Query fails with "UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query

2018-03-16 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-6260:
--

 Summary: Query fails with "UNSUPPORTED_OPERATION ERROR: Non-scalar 
sub-query used in an expression" when it contains a cast expression around a 
scalar sub-query 
 Key: DRILL-6260
 URL: https://issues.apache.org/jira/browse/DRILL-6260
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.13.0
Reporter: Abhishek Girish


{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> cast(max(T2.a) as varchar) FROM `t2.json` T2);

Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
See Apache Drill JIRA: DRILL-1937
{code}

Slightly different variants of the query work fine. 
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> max(cast(T2.a as varchar)) FROM `t2.json` T2);

+--+--+
| text | json |
+--+--+
| 00-00    Screen
00-01      Project(b=[$0])
00-02        Project(b=[$1])
00-03          SelectionVectorRemover
00-04            Filter(condition=[=($0, $2)])
00-05              NestedLoopJoin(condition=[true], joinType=[left])
00-07                Scan(table=[[si, tmp, t1.json]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/t1.json, numFiles=1, columns=[`a`, `b`], 
files=[maprfs:///tmp/t1.json]]])
00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
"UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
00-09                    Scan(table=[[si, tmp, t2.json]], 
groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
{code}
> explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> max(T2.a) FROM `t2.json` T2);

+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(b=[$0])
00-02Project(b=[$1])
00-03  SelectionVectorRemover
00-04Filter(condition=[=($0, $2)])
00-05  NestedLoopJoin(condition=[true], joinType=[left])
00-07Scan(table=[[si, tmp, t1.json]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/t1.json, numFiles=1, columns=[`a`, `b`], 
files=[maprfs:///tmp/t1.json]]])
00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
00-08  Scan(table=[[si, tmp, t2.json]], 
groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
columns=[`a`], files=[maprfs:///tmp/t2.json]]])
{code}

File contents:
{code}
# cat t1.json 
{"a":1, "b":"V"}
{"a":2, "b":"W"}
{"a":3, "b":"X"}
{"a":4, "b":"Y"}
{"a":5, "b":"Z"}

# # cat t2.json 
{"a":1, "b":"A"}
{"a":2, "b":"B"}
{"a":3, "b":"C"}
{"a":4, "b":"D"}
{"a":5, "b":"E"}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6252) Foreman node is going down when the non foreman node is stopped

2018-03-16 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated DRILL-6252:
--
Priority: Major  (was: Critical)

> Foreman node is going down when the non foreman node is stopped
> ---
>
> Key: DRILL-6252
> URL: https://issues.apache.org/jira/browse/DRILL-6252
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Venkata Jyothsna Donapati
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Two drillbits are running. I'm running a join query over parquet and tried to 
> stop the non-foreman node using drillbit.sh stop. The query fails with 
> *"Error: DATA_READ ERROR: Exception occurred while reading from disk".* The 
> non-foreman node goes down. The foreman node also goes down. When I looked at 
> the drillbit.log of both foreman and non-foreman I found that there is memory 
> leak  "Memory was leaked by query. Memory leaked: 
> (2097152)\nAllocator(op:2:0:0:HashPartitionSender) 
> 100/6291456/6832128/100 (res/actual/peak/limit)\n". Following are 
> the stack traces for memory leaks 
> {noformat} 
> [Error Id: 0d9a2799-7e97-46b3-953b-1f8d0dd87a04 on qa102-34.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (3145728)
> Allocator(op:2:1:0:HashPartitionSender) 100/6291456/6291456/100 
> (res/actual/peak/limit)
>  
>  
> Fragment 2:1 
> [Error Id: 0d9a2799-7e97-46b3-953b-1f8d0dd87a04 on qa102-34.qa.lab:31010]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:297)
>  [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266)
>  [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Memory was leaked by query. 
> Memory leaked: (3145728)
> Allocator(op:2:1:0:HashPartitionSender) 100/6291456/6291456/100 
> (res/actual/peak/limit)
> {noformat} 
>  
> Ping me for the logs and more information.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402106#comment-16402106
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1152
  
@HanumathRao thanks for the review. Applied code review comment.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402103#comment-16402103
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1152#discussion_r175137249
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestPushDownAndPruningWithItemStar.java
 ---
@@ -180,4 +248,38 @@ public void testFilterPushDownMultipleConditions() 
throws Exception {
 .build();
   }
 
+  @Test
+  public void testFilterPushDownWithSeveralNestedStarSubQueries() throws 
Exception {
+String subQuery = String.format("select * from `%s`.`%s`", 
DFS_TMP_SCHEMA, TABLE_NAME);
+String query = String.format("select * from (select * from (select * 
from (%s))) where o_orderdate = date '1992-01-01'", subQuery);
+
+String[] expectedPlan = {"numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=\\[`\\*\\*`, `o_orderdate`\\]"};
+String[] excludedPlan = {};
+
+PlanTestBase.testPlanMatchingPatterns(query, expectedPlan, 
excludedPlan);
+
+testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.sqlBaselineQuery("select * from `%s`.`%s` where o_orderdate = 
date '1992-01-01'", DFS_TMP_SCHEMA, TABLE_NAME)
+.build();
+  }
+
+  @Test
+  public void 
testFilterPushDownWithSeveralNestedStarSubQueriesWithAdditionalColumns() throws 
Exception {
+String subQuery = String.format("select * from `%s`.`%s`", 
DFS_TMP_SCHEMA, TABLE_NAME);
+String query = String.format("select * from (select * from (select *, 
o_orderdate from (%s))) where o_orderdate = date '1992-01-01'", subQuery);
--- End diff --

Done.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402104#comment-16402104
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1152#discussion_r175120182
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java
 ---
@@ -54,83 +44,189 @@
 import static 
org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter;
 
 /**
- * Rule will transform filter -> project -> scan call with item star 
fields in filter
- * into project -> filter -> project -> scan where item star fields are 
pushed into scan
- * and replaced with actual field references.
+ * Rule will transform item star fields in filter and replaced with actual 
field references.
  *
  * This will help partition pruning and push down rules to detect fields 
that can be pruned or push downed.
  * Item star operator appears when sub-select or cte with star are used as 
source.
  */
-public class DrillFilterItemStarReWriterRule extends RelOptRule {
+public class DrillFilterItemStarReWriterRule {
 
-  public static final DrillFilterItemStarReWriterRule INSTANCE = new 
DrillFilterItemStarReWriterRule(
-  RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, 
RelOptHelper.any( TableScan.class))),
-  "DrillFilterItemStarReWriterRule");
+  public static final DrillFilterItemStarReWriterRule.ProjectOnScan 
PROJECT_ON_SCAN = new ProjectOnScan(
+  RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.ProjectOnScan");
 
-  private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, 
String id) {
-super(operand, id);
-  }
+  public static final DrillFilterItemStarReWriterRule.FilterOnScan 
FILTER_ON_SCAN = new FilterOnScan(
+  RelOptHelper.some(DrillFilterRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.FilterOnScan");
 
-  @Override
-  public void onMatch(RelOptRuleCall call) {
-Filter filterRel = call.rel(0);
-Project projectRel = call.rel(1);
-TableScan scanRel = call.rel(2);
+  public static final DrillFilterItemStarReWriterRule.FilterOnProject 
FILTER_ON_PROJECT = new FilterOnProject(
+  RelOptHelper.some(DrillFilterRel.class, 
RelOptHelper.some(DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class))),
+  "DrillFilterItemStarReWriterRule.FilterOnProject");
 
-ItemStarFieldsVisitor itemStarFieldsVisitor = new 
ItemStarFieldsVisitor(filterRel.getRowType().getFieldNames());
-filterRel.getCondition().accept(itemStarFieldsVisitor);
 
-// there are no item fields, no need to proceed further
-if (!itemStarFieldsVisitor.hasItemStarFields()) {
-  return;
+  private static class ProjectOnScan extends RelOptRule {
+
+ProjectOnScan(RelOptRuleOperand operand, String id) {
+  super(operand, id);
 }
 
-Map itemStarFields = 
itemStarFieldsVisitor.getItemStarFields();
+@Override
+public boolean matches(RelOptRuleCall call) {
+  DrillScanRel scan = call.rel(1);
+  return scan.getGroupScan() instanceof ParquetGroupScan && 
super.matches(call);
+}
 
-// create new scan
-RelNode newScan = constructNewScan(scanRel, itemStarFields.keySet());
+@Override
+public void onMatch(RelOptRuleCall call) {
+  DrillProjectRel projectRel = call.rel(0);
+  DrillScanRel scanRel = call.rel(1);
+
+  ItemStarFieldsVisitor itemStarFieldsVisitor = new 
ItemStarFieldsVisitor(scanRel.getRowType().getFieldNames());
+  List projects = projectRel.getProjects();
+  for (RexNode project : projects) {
+project.accept(itemStarFieldsVisitor);
+  }
 
-// combine original and new projects
-List newProjects = new ArrayList<>(projectRel.getProjects());
+  Map itemStarFields = 
itemStarFieldsVisitor.getItemStarFields();
 
-// prepare node mapper to replace item star calls with new input field 
references
-Map fieldMapper = new HashMap<>();
+  // if there are no item fields, no need to proceed further
+  if (itemStarFieldsVisitor.hasNoItemStarFields()) {
--- End diff --

Sure, moved.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
>  

[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402102#comment-16402102
 ] 

ASF GitHub Bot commented on DRILL-6199:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1152#discussion_r175136589
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java
 ---
@@ -54,83 +44,189 @@
 import static 
org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter;
 
 /**
- * Rule will transform filter -> project -> scan call with item star 
fields in filter
- * into project -> filter -> project -> scan where item star fields are 
pushed into scan
- * and replaced with actual field references.
+ * Rule will transform item star fields in filter and replaced with actual 
field references.
  *
  * This will help partition pruning and push down rules to detect fields 
that can be pruned or push downed.
  * Item star operator appears when sub-select or cte with star are used as 
source.
  */
-public class DrillFilterItemStarReWriterRule extends RelOptRule {
+public class DrillFilterItemStarReWriterRule {
 
-  public static final DrillFilterItemStarReWriterRule INSTANCE = new 
DrillFilterItemStarReWriterRule(
-  RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, 
RelOptHelper.any( TableScan.class))),
-  "DrillFilterItemStarReWriterRule");
+  public static final DrillFilterItemStarReWriterRule.ProjectOnScan 
PROJECT_ON_SCAN = new ProjectOnScan(
+  RelOptHelper.some(DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.ProjectOnScan");
 
-  private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, 
String id) {
-super(operand, id);
-  }
+  public static final DrillFilterItemStarReWriterRule.FilterOnScan 
FILTER_ON_SCAN = new FilterOnScan(
+  RelOptHelper.some(DrillFilterRel.class, 
RelOptHelper.any(DrillScanRel.class)),
+  "DrillFilterItemStarReWriterRule.FilterOnScan");
 
-  @Override
-  public void onMatch(RelOptRuleCall call) {
-Filter filterRel = call.rel(0);
-Project projectRel = call.rel(1);
-TableScan scanRel = call.rel(2);
+  public static final DrillFilterItemStarReWriterRule.FilterOnProject 
FILTER_ON_PROJECT = new FilterOnProject(
--- End diff --

Done.


> Filter push down doesn't work with more than one nested subqueries
> --
>
> Key: DRILL-6199
> URL: https://issues.apache.org/jira/browse/DRILL-6199
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
> Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6259) Support parquet filter push down for complex types

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402086#comment-16402086
 ] 

ASF GitHub Bot commented on DRILL-6259:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/1173

DRILL-6259: Support parquet filter push down for complex types

Details in [DRILL-6259](https://issues.apache.org/jira/browse/DRILL-6259).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-6259

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1173.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1173


commit 7a694cedc76d76ce062b393ddd30002e8a6ba11a
Author: Arina Ielchiieva 
Date:   2018-03-13T17:54:25Z

DRILL-6259: Support parquet filter push down for complex types




> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. 
> Statistics for the second case won't show any nulls but when querying from 
> two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6259) Support parquet filter push down for complex types

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6259:

Description: 
Currently parquet filter push down is not working for complex types (including 
arrays).

This Jira aims to implement filter push down for complex types which underneath 
type is among supported simple types for filter push down. For instance, 
currently Drill does not support filter push down for varchars, decimals etc. 
Though once Drill will start support, this support will be applied for complex 
type automatically.

Complex fields will be pushed down the same way regular fields are, except for 
one case with arrays.

Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
push down because we are not able to determine exact number of nulls in arrays 
fields. 

{{Consider [1, 2, 3]}} vs {{[1, 2]}} if}} these arrays are in different files. 
Statistics for the second case won't show any nulls but when querying from two 
files, in terms of data the third value in array is null.

 

  was:
Currently parquet filter push down is not working for complex types (including 
arrays).

This Jira aims to implement filter push down for complex types which underneath 
type is among supported simple types for filter push down. For instance, 
currently Drill does not support filter push down for varchars, decimals etc. 
Though once Drill will start support, this support will be applied for complex 
type automatically.

Complex fields will be pushed down the same way regular fields are, except for 
one case with arrays.

Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
push down because we are not able to determine exact number of nulls in arrays 
fields. 

{{Consider [1, 2, 3]}} vs {{[1, 2]. If}} these arrays are in different files. 
Statistics for the second case won't show any nulls but when querying from two 
files, in terms of data the third value in array is null.

 


> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]}} if}} these arrays are in different 
> files. Statistics for the second case won't show any nulls but when querying 
> from two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6259) Support parquet filter push down for complex types

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6259:

Description: 
Currently parquet filter push down is not working for complex types (including 
arrays).

This Jira aims to implement filter push down for complex types which underneath 
type is among supported simple types for filter push down. For instance, 
currently Drill does not support filter push down for varchars, decimals etc. 
Though once Drill will start support, this support will be applied for complex 
type automatically.

Complex fields will be pushed down the same way regular fields are, except for 
one case with arrays.

Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
push down because we are not able to determine exact number of nulls in arrays 
fields. 

{{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. 
Statistics for the second case won't show any nulls but when querying from two 
files, in terms of data the third value in array is null.

 

  was:
Currently parquet filter push down is not working for complex types (including 
arrays).

This Jira aims to implement filter push down for complex types which underneath 
type is among supported simple types for filter push down. For instance, 
currently Drill does not support filter push down for varchars, decimals etc. 
Though once Drill will start support, this support will be applied for complex 
type automatically.

Complex fields will be pushed down the same way regular fields are, except for 
one case with arrays.

Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
push down because we are not able to determine exact number of nulls in arrays 
fields. 

{{Consider [1, 2, 3]}} vs {{[1, 2]}} if}} these arrays are in different files. 
Statistics for the second case won't show any nulls but when querying from two 
files, in terms of data the third value in array is null.

 


> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. 
> Statistics for the second case won't show any nulls but when querying from 
> two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6259) Support parquet filter push down for complex types

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6259:

Summary: Support parquet filter push down for complex types  (was: 
Implement parquet filter push down for complex types)

> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]. If}} these arrays are in different files. 
> Statistics for the second case won't show any nulls but when querying from 
> two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6242) Output format for nested date, time, timestamp values in an object hierarchy

2018-03-16 Thread Jiang Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402061#comment-16402061
 ] 

Jiang Wu commented on DRILL-6242:
-

I can take a look at the changes required.  Will update if this becomes too 
complicated for me to do.

> Output format for nested date, time, timestamp values in an object hierarchy
> 
>
> Key: DRILL-6242
> URL: https://issues.apache.org/jira/browse/DRILL-6242
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.12.0
>Reporter: Jiang Wu
>Priority: Major
>
> Some storages (mapr db, mongo db, etc.) have hierarchical objects that 
> contain nested fields of date, time, timestamp types.  When a query returns 
> these objects, the output format for the nested date, time, timestamp, are 
> showing the internal object (org.joda.time.DateTime), rather than the logical 
> data value.
> For example.  Suppose in MongoDB, we have a single object that looks like 
> this:
> {code:java}
> > db.test.findOne();
> {
> "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
> "name" : "orange",
> "context" : {
> "date" : ISODate("2018-03-13T21:52:54.940Z"),
> "user" : "jack"
> }
> }
> {code}
> Then connect Drill to the above MongoDB storage, and run the following query 
> within Drill:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | 
> {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"}
>  |
> {code}
> We can see that from the above output, when the date field is retrieved as a 
> top level column, Drill outputs a logical date value.  But when the same 
> field is within an object hierarchy, Drill outputs the internal object used 
> to hold the date value.
> The expected output is the same display for whether the date field is shown 
> as a top level column or when it is within an object hierarchy:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | {"date":"2018-03-13","user":"jack"} |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6259) Implement parquet filter push down for complex types

2018-03-16 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-6259:
---

 Summary: Implement parquet filter push down for complex types
 Key: DRILL-6259
 URL: https://issues.apache.org/jira/browse/DRILL-6259
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.13.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.14.0


Currently parquet filter push down is not working for complex types (including 
arrays).

This Jira aims to implement filter push down for complex types which underneath 
type is among supported simple types for filter push down. For instance, 
currently Drill does not support filter push down for varchars, decimals etc. 
Though once Drill will start support, this support will be applied for complex 
type automatically.

Complex fields will be pushed down the same way regular fields are, except for 
one case with arrays.

Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
push down because we are not able to determine exact number of nulls in arrays 
fields. 

{{Consider [1, 2, 3]}} vs {{[1, 2]. If}} these arrays are in different files. 
Statistics for the second case won't show any nulls but when querying from two 
files, in terms of data the third value in array is null.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6256) Remove references to java 7 from readme and other files

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6256:
---

Assignee: Volodymyr Tkach  (was: Arina Ielchiieva)

> Remove references to java 7 from readme and other files
> ---
>
> Key: DRILL-6256
> URL: https://issues.apache.org/jira/browse/DRILL-6256
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Since master branch uses jdk 8 we should remove all references to java 7.
> Also change min required maven version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6256) Remove references to java 7 from readme and other files

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6256:

Labels: ready-to-commit  (was: )

> Remove references to java 7 from readme and other files
> ---
>
> Key: DRILL-6256
> URL: https://issues.apache.org/jira/browse/DRILL-6256
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Since master branch uses jdk 8 we should remove all references to java 7.
> Also change min required maven version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6256) Remove references to java 7 from readme and other files

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6256:
---

Assignee: Arina Ielchiieva  (was: Volodymyr Tkach)

> Remove references to java 7 from readme and other files
> ---
>
> Key: DRILL-6256
> URL: https://issues.apache.org/jira/browse/DRILL-6256
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Volodymyr Tkach
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Since master branch uses jdk 8 we should remove all references to java 7.
> Also change min required maven version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6256) Remove references to java 7 from readme and other files

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401968#comment-16401968
 ] 

ASF GitHub Bot commented on DRILL-6256:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1172
  
+1


> Remove references to java 7 from readme and other files
> ---
>
> Key: DRILL-6256
> URL: https://issues.apache.org/jira/browse/DRILL-6256
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.14.0
>
>
> Since master branch uses jdk 8 we should remove all references to java 7.
> Also change min required maven version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6251) Queries from system tables are hang

2018-03-16 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-6251:
--

Assignee: Vitalii Diravka

> Queries from system tables are hang
> ---
>
> Key: DRILL-6251
> URL: https://issues.apache.org/jira/browse/DRILL-6251
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>
> On CentoOS cluster Drill hangs while querying sys tables after "use dfs;" 
> (embedded or distributed mode):
> {code}
> 0: jdbc:drill:> select * from sys.version;
> +--+---+--++++
> | version  | commit_id |
> commit_message|commit_time |
> build_email | build_time |
> +--+---+--++++
> | 1.13.0   | 796fcf051b3553c4597abbdca5ca247b139734ba  | 
> [maven-release-plugin] prepare release drill-1.13.0  | 13.03.2018 @ 11:39:14 
> IST  | par...@apache.org  | 13.03.2018 @ 13:13:45 IST  |
> +--+---+--++++
> 1 row selected (3.784 seconds)
> 0: jdbc:drill:> use dfs;
> +---+--+
> |  ok   | summary  |
> +---+--+
> | true  | Default schema changed to [dfs]  |
> +---+--+
> 1 row selected (0.328 seconds)
> 0: jdbc:drill:> select * from sys.version;
> Error: Statement canceled (state=,code=0)
> 0: jdbc:drill:> 
> {code}
> *Note*: there is no failure on local debian machine with Drill in embedded 
> mode.
> dfs pugin configs are default (with "connection": "file:///", other file 
> systems works good).
> This failure is connected to DRILL-5089 and Calcite rebase. 
> Related commits:
> https://github.com/apache/drill/commit/450e67094eb6e9a6484d7f86c49b51c77a08d7b2
> https://github.com/apache/drill/commit/18a71a38f6bd1fd33d21d1c68fc23c5901b0080a
> After analyzing in remote debug I found the following flow:
> "dfs" DynamicRootSchema is created, then a new "sys" one is created.
> After Calcite validate "sys" SimpleCalciteSchema is created. But in 
> WorkspaceSchemaFactory#create  wrong WorkspaceConfig is left and "/" is 
> combined with "sys".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6145) Implement Hive MapR-DB JSON handler.

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401784#comment-16401784
 ] 

ASF GitHub Bot commented on DRILL-6145:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/1158
  
@priteshm @priteshm I have created a Jira for above mentioned issue: 
[DRILL-6258](https://issues.apache.org/jira/browse/DRILL-6258)


> Implement Hive MapR-DB JSON handler. 
> -
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> hive> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> > movie_id string, title string, studio string) 
> > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> > TBLPROPERTIES("maprdb.table.name" = 
> "/tmp/table/json","maprdb.column.id" = "movie_id");
> {code}
>  
>  #  Use hive schema to query this table via Drill:
> {code}
> 0: jdbc:drill:> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6258) Jar files aren't downloaded if dependency is present only in profile section

2018-03-16 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-6258:
--

 Summary: Jar files aren't downloaded if dependency is present only 
in profile section
 Key: DRILL-6258
 URL: https://issues.apache.org/jira/browse/DRILL-6258
 Project: Apache Drill
  Issue Type: Improvement
  Components: Tools, Build  Test
Affects Versions: 1.13.0
Reporter: Vitalii Diravka
 Fix For: Future


Dependencies from specific profiles in POM files of any modules, which are 
present in distribution POM should be downloaded as jars (with enabled 
appropriate profile) like dependencies from common section of POM files.

It will allow don't create extra dependency sections or additional modules.

Currently to add jar files for some specific profile it is necessary to add it 
to profile section in distribution/pom file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-4493) Fixed issues in various POMs with MapR profile

2018-03-16 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-4493.

Resolution: Fixed

It was merged into master branch with commit id c047f04b507faec

> Fixed issues in various POMs with MapR profile
> --
>
> Key: DRILL-4493
> URL: https://issues.apache.org/jira/browse/DRILL-4493
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.6.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Major
> Fix For: 1.6.0
>
>
> * Remove inclusion of some transitive dependencies from distribution pom.
> * Remove maprfs/json artifacts from "mapr" profile in drill-java-exec pom.
> * Set "hadoop-common"'s scope as test in jdbc pom (without this the jdbc-all 
> jar bloats to >60MB).
> * Revert HBase version to 0.98.12-mapr-1602-m7-5.1.0.
> * Exclude log4j and commons-logging from some HBase artifacts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4493) Fixed issues in various POMs with MapR profile

2018-03-16 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4493:
---
Fix Version/s: (was: Future)
   1.6.0

> Fixed issues in various POMs with MapR profile
> --
>
> Key: DRILL-4493
> URL: https://issues.apache.org/jira/browse/DRILL-4493
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.6.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Major
> Fix For: 1.6.0
>
>
> * Remove inclusion of some transitive dependencies from distribution pom.
> * Remove maprfs/json artifacts from "mapr" profile in drill-java-exec pom.
> * Set "hadoop-common"'s scope as test in jdbc pom (without this the jdbc-all 
> jar bloats to >60MB).
> * Revert HBase version to 0.98.12-mapr-1602-m7-5.1.0.
> * Exclude log4j and commons-logging from some HBase artifacts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6145) Implement Hive MapR-DB JSON handler.

2018-03-16 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6145:
---
Reviewer: Vlad Rozov  (was: Sorabh Hamirwasia)

> Implement Hive MapR-DB JSON handler. 
> -
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> hive> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> > movie_id string, title string, studio string) 
> > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> > TBLPROPERTIES("maprdb.table.name" = 
> "/tmp/table/json","maprdb.column.id" = "movie_id");
> {code}
>  
>  #  Use hive schema to query this table via Drill:
> {code}
> 0: jdbc:drill:> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6145) Implement Hive MapR-DB JSON handler.

2018-03-16 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6145:
---
Issue Type: Improvement  (was: Bug)

> Implement Hive MapR-DB JSON handler. 
> -
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> hive> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> > movie_id string, title string, studio string) 
> > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> > TBLPROPERTIES("maprdb.table.name" = 
> "/tmp/table/json","maprdb.column.id" = "movie_id");
> {code}
>  
>  #  Use hive schema to query this table via Drill:
> {code}
> 0: jdbc:drill:> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6246) Build Failing in jdbc-all artifact

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401723#comment-16401723
 ] 

ASF GitHub Bot commented on DRILL-6246:
---

Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/1168
  
Classes from `avatica.metrics` are used in `JsonHandler`, `ProtobufHandler` 
and `LocalService`. If Drill does not use these classes than I agree that we 
can exclude it from `jdbc-all` jar. 
Regarding excluding `avatica/org/**`, looks like the problem is in the 
Avatica pom files since there are no dependencies to `org.apache.commons` and 
`org.apache.http`, but they are shaded to the jar. Created Jira CALCITE-2215 to 
fix this issue, but for now, I think it's ok to exclude them.


> Build Failing in jdbc-all artifact
> --
>
> Key: DRILL-6246
> URL: https://issues.apache.org/jira/browse/DRILL-6246
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.13.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>
> * {color:#00}It was noticed that the build was failing because of the 
> jdbc-all artifact{color}
>  * {color:#00}The maximum compressed jar size was set to 32MB but we are 
> currently creating a JAR a bit larger than 32MB {color}
>  * {color:#00}I compared apache drill-1.10.0, drill-1.12.0, and 
> drill-1.13.0 (on my MacOS){color}
>  * {color:#00}jdbc-all-1.10.0 jar size: 21MB{color}
>  * {color:#00}jdbc-all-1.12.0 jar size: 27MB{color}
>  * {color:#00}jdbc-all-1.13.0 jar size: 34MB (on Linux this size is 
> roughly 32MB){color}
>  * {color:#00}Compared then in more details jdbc-all-1.12.0 and 
> jdbc-all-1.13.0{color}
>  * {color:#00}The bulk of the increase is attributed to the calcite 
> artifact{color}
>  * {color:#00}Used to be 2MB (uncompressed) and now 22MB 
> (uncompressed){color}
>  * {color:#00}It is likely an exclusion problem {color}
>  * {color:#00}The jdbc-all-1.12.0 version has only two top packages 
> calcite/avatica/utils and calcite/avatica/remote{color}
>  * {color:#00}The jdbc-all-1.13.0  includes new packages (within 
> calcite/avatica) metrics, proto, org/apache/, com/fasterxml, com/google{color}
> {color:#00} {color}
> {color:#00}I am planning to exclude these new sub-packages{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6256) Remove references to java 7 from readme and other files

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6256:

Affects Version/s: 1.13.0

> Remove references to java 7 from readme and other files
> ---
>
> Key: DRILL-6256
> URL: https://issues.apache.org/jira/browse/DRILL-6256
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.14.0
>
>
> Since master branch uses jdk 8 we should remove all references to java 7.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6256) Remove references to java 7 from readme and other files

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6256:

   Reviewer: Arina Ielchiieva
Description: 
Since master branch uses jdk 8 we should remove all references to java 7.

Also change min required maven version.

  was:Since master branch uses jdk 8 we should remove all references to java 7.


> Remove references to java 7 from readme and other files
> ---
>
> Key: DRILL-6256
> URL: https://issues.apache.org/jira/browse/DRILL-6256
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.14.0
>
>
> Since master branch uses jdk 8 we should remove all references to java 7.
> Also change min required maven version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6256) Remove references to java 7 from readme and other files

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6256:

Fix Version/s: 1.14.0

> Remove references to java 7 from readme and other files
> ---
>
> Key: DRILL-6256
> URL: https://issues.apache.org/jira/browse/DRILL-6256
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
> Fix For: 1.14.0
>
>
> Since master branch uses jdk 8 we should remove all references to java 7.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-1491) Support for JDK 8

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-1491:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Support for JDK 8
> -
>
> Key: DRILL-1491
> URL: https://issues.apache.org/jira/browse/DRILL-1491
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build  Test
>Reporter: Aditya Kishore
>Assignee: Volodymyr Tkach
>Priority: Blocker
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
> Attachments: DRILL-1491.1.patch.txt
>
>
> This will be the umbrella JIRA used to track and fix issues with JDK 8 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-1491) Support for JDK 8

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-1491:

Labels: doc-impacting  (was: )

> Support for JDK 8
> -
>
> Key: DRILL-1491
> URL: https://issues.apache.org/jira/browse/DRILL-1491
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build  Test
>Reporter: Aditya Kishore
>Assignee: Volodymyr Tkach
>Priority: Blocker
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
> Attachments: DRILL-1491.1.patch.txt
>
>
> This will be the umbrella JIRA used to track and fix issues with JDK 8 
> support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6257) Sqlline start command with password appears in the sqlline.log

2018-03-16 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-6257.
---
Resolution: Duplicate

Original issue https://issues.apache.org/jira/browse/DRILL-6250.

> Sqlline start command with password appears in the sqlline.log
> --
>
> Key: DRILL-6257
> URL: https://issues.apache.org/jira/browse/DRILL-6257
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
>
> *Prerequisites:*
>  *1.* Log level is set to "all" in the conf/logback.xml:
> {code:xml}
> 
> 
> 
> 
> {code}
> *2.* PLAIN authentication mechanism is configured:
> {code:java}
>   security.user.auth: {
>   enabled: true,
>   packages += "org.apache.drill.exec.rpc.user.security",
>   impl: "pam",
>   pam_profiles: [ "sudo", "login" ]
>   }
> {code}
> *Steps:*
>  *1.* Start the drillbits
>  *2.* Connect by sqlline:
> {noformat}
> /opt/mapr/drill/drill-1.13.0/bin/sqlline -u "jdbc:drill:zk=node1:5181;" -n 
> user1 -p 1234
> {noformat}
> *3.* Use check the sqlline logs:
> {noformat}
> tail -F log/sqlline.log|grep 1234 -a5 -b5
> {noformat}
> *Expected result:* Logs shouldn't contain clear-text passwords
> *Actual result:* The logs contain the sqlline start command with password:
> {noformat}
> # system properties
> 35333-"java" : {
> 35352-# system properties
> 35384:"command" : "sqlline.SqlLine -d 
> org.apache.drill.jdbc.Driver --maxWidth=1 --color=true -u 
> jdbc:drill:zk=node1:5181; -n user1 -p 1234",
> 35535-# system properties
> 35567-"launcher" : "SUN_STANDARD"
> 35607-}
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6257) Sqlline start command with password appears in the sqlline.log

2018-03-16 Thread Volodymyr Tkach (JIRA)
Volodymyr Tkach created DRILL-6257:
--

 Summary: Sqlline start command with password appears in the 
sqlline.log
 Key: DRILL-6257
 URL: https://issues.apache.org/jira/browse/DRILL-6257
 Project: Apache Drill
  Issue Type: Bug
Reporter: Volodymyr Tkach
Assignee: Volodymyr Tkach


*Prerequisites:*
 *1.* Log level is set to "all" in the conf/logback.xml:
{code:xml}




{code}
*2.* PLAIN authentication mechanism is configured:
{code:java}
  security.user.auth: {
enabled: true,
packages += "org.apache.drill.exec.rpc.user.security",
impl: "pam",
pam_profiles: [ "sudo", "login" ]
  }
{code}
*Steps:*
 *1.* Start the drillbits
 *2.* Connect by sqlline:
{noformat}
/opt/mapr/drill/drill-1.13.0/bin/sqlline -u "jdbc:drill:zk=node1:5181;" -n 
user1 -p 1234
{noformat}
*3.* Use check the sqlline logs:
{noformat}
tail -F log/sqlline.log|grep 1234 -a5 -b5
{noformat}
*Expected result:* Logs shouldn't contain clear-text passwords

*Actual result:* The logs contain the sqlline start command with password:
{noformat}
# system properties
35333-"java" : {
35352-# system properties
35384:"command" : "sqlline.SqlLine -d org.apache.drill.jdbc.Driver 
--maxWidth=1 --color=true -u jdbc:drill:zk=node1:5181; -n user1 -p 1234",
35535-# system properties
35567-"launcher" : "SUN_STANDARD"
35607-}
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6256) Remove references to java 7 from readme and other files

2018-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401487#comment-16401487
 ] 

ASF GitHub Bot commented on DRILL-6256:
---

GitHub user vladimirtkach opened a pull request:

https://github.com/apache/drill/pull/1172

DRILL-6256: Remove references to java 7 from readme and other files



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vladimirtkach/drill DRILL-6256

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1172.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1172


commit 8349436f24f22e44cf9c0dfe2bd453d8a9fd3137
Author: vladimir tkach 
Date:   2018-03-16T05:58:42Z

DRILL-6256: Remove references to java 7 from readme and other files




> Remove references to java 7 from readme and other files
> ---
>
> Key: DRILL-6256
> URL: https://issues.apache.org/jira/browse/DRILL-6256
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
>
> Since master branch uses jdk 8 we should remove all references to java 7.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)