[jira] [Closed] (DRILL-4652) C++ client build breaks when trying to include commit messages with quotes

2017-03-31 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua closed DRILL-4652.
---

Build issue that has been verified with a successful build of C++ client.

> C++ client build breaks when trying to include commit messages with quotes
> --
>
> Key: DRILL-4652
> URL: https://issues.apache.org/jira/browse/DRILL-4652
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>
> The C++ client build generates a string based on git commit info to print to 
> the log at startup time. This breaks if the commit message has quotes since 
> the embedded quotes are not escaped.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5399) Random Error : Flatten does not support inputs of non-list values.

2017-03-31 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951849#comment-15951849
 ] 

Rahul Challapalli commented on DRILL-5399:
--

One more instance

Query :
{code}
 select id, flatten(kvgen(m)) from `json_kvgenflatten/missing-map.json`
{code}

Data:
{code}
{
"id": 1,
"m": {"a":1,"b":2}
}
{
"id": 2
}
{
"id": 3,
"m": {"c":3,"d":4}
}
{code}

Plan :
{code}
00-00Screen : rowType = RecordType(ANY id, ANY EXPR$1): rowcount = 1.0, 
cumulative cost = {2.1 rows, 5.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 761
00-01  Project(id=[$0], EXPR$1=[$3]) : rowType = RecordType(ANY id, ANY 
EXPR$1): rowcount = 1.0, cumulative cost = {2.0 rows, 5.0 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 760
00-02Flatten(flattenField=[$3]) : rowType = RecordType(ANY EXPR$0, ANY 
EXPR$1, ANY EXPR$3, ANY EXPR$4): rowcount = 1.0, cumulative cost = {2.0 rows, 
5.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 759
00-03  Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$3=[$2], EXPR$4=[$2]) : 
rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY EXPR$3, ANY EXPR$4): rowcount 
= 1.0, cumulative cost = {1.0 rows, 4.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
id = 758
00-04Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$3=[KVGEN($1)]) : 
rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY EXPR$3): rowcount = 1.0, 
cumulative cost = {1.0 rows, 4.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 757
00-05  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/json_kvgenflatten/missing-map.json, 
numFiles=1, columns=[`id`, `m`], 
files=[maprfs:///drill/testdata/json_kvgenflatten/missing-map.json]]]) : 
rowType = RecordType(ANY id, ANY m): rowcount = 1.0, cumulative cost = {0.0 
rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 756
{code}

And in the logs, I see this warning. Looks like we are failing while setting up 
the new schema in the project operator. Could the json reader possibly be 
messing it up?
{code}
2017-03-31 15:27:55,863 [27212813-c2fa-204a-2971-015ea610ad67:frag:0:0] WARN  
o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path 
`EXPR$3`, returning null instance.
{code}

> Random Error : Flatten does not support inputs of non-list values.
> --
>
> Key: DRILL-5399
> URL: https://issues.apache.org/jira/browse/DRILL-5399
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - JSON
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=38ef562
> The below query did not fail when I ran it in isolation. However when I ran 
> the test suite at [1], which also contains the below query, by using 50 
> threads submitting queries concurrently, I hit the below error.
> {code}
> select flatten(sub.fk.`value`) from (select flatten(kvgen(map)) fk from 
> `json_kvgenflatten/nested3.json`) sub
> Failed with exception
> java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Flatten does not support 
> inputs of non-list values.
> Fragment 0:0
> [Error Id: 90026283-0b95-4bda-948e-54ed57a62edf on qa-node183.qa.lab:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
>   at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
>   at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at 

[jira] [Commented] (DRILL-3474) Add implicit file columns support

2017-03-31 Thread Bridget Bevens (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951778#comment-15951778
 ] 

Bridget Bevens commented on DRILL-3474:
---

Link to doc: 
http://drill.apache.org/docs/querying-a-file-system-introduction/#implicit-columns
 

> Add implicit file columns support
> -
>
> Key: DRILL-3474
> URL: https://issues.apache.org/jira/browse/DRILL-3474
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> I could not find another ticket which talks about this ...
> The file name should be a column which can be selected or filtered when 
> querying a directory just like dir0, dir1 are available.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4604) Generate warning on Web UI if drillbits version mismatch is detected

2017-03-31 Thread Bridget Bevens (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951774#comment-15951774
 ] 

Bridget Bevens commented on DRILL-4604:
---

Link to doc: 
http://drill.apache.org/docs/identifying-multiple-drill-versions-in-a-cluster/ 

> Generate warning on Web UI if drillbits version mismatch is detected
> 
>
> Key: DRILL-4604
> URL: https://issues.apache.org/jira/browse/DRILL-4604
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
> Attachments: index_page.JPG, index_page_mismatch.JPG, 
> NEW_matching_drillbits.JPG, NEW_mismatching_drillbits.JPG, 
> screenshots_with_different_states.docx
>
>
> Display drillbit version on web UI. If any of drillbits version doesn't match 
> with current drillbit, generate warning.
> Screenshots - NEW_matching_drillbits.JPG, NEW_mismatching_drillbits.JPG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5098) Improving fault tolerance for connection between client and foreman node.

2017-03-31 Thread Bridget Bevens (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951773#comment-15951773
 ] 

Bridget Bevens commented on DRILL-5098:
---

Link to 
doc:http://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection

> Improving fault tolerance for connection between client and foreman node.
> -
>
> Key: DRILL-5098
> URL: https://issues.apache.org/jira/browse/DRILL-5098
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> With DRILL-5015 we allowed support for specifying multiple Drillbits in 
> connection string and randomly choosing one out of it. Over time some of the 
> Drillbits specified in the connection string may die and the client can fail 
> to connect to Foreman node if random selection happens to be of dead Drillbit.
> Even if ZooKeeper is used for selecting a random Drillbit from the registered 
> one there is a small window when client selects one Drillbit and then that 
> Drillbit went down. The client will fail to connect to this Drillbit and 
> error out. 
> Instead if we try multiple Drillbits (configurable tries count through 
> connection string) then the probability of hitting this error window will 
> reduce in both the cases improving fault tolerance. During further 
> investigation it was also found that if there is Authentication failure then 
> we throw that error as generic RpcException. We need to improve that as well 
> to capture this case explicitly since in case of Auth failure we don't want 
> to try multiple Drillbits.
> Connection string example with new parameter:
> jdbc:drill:drillbit=[:][,[:]...;tries=5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

2017-03-31 Thread Bridget Bevens (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951770#comment-15951770
 ] 

Bridget Bevens commented on DRILL-4203:
---

Link to doc: 
http://drill.apache.org/docs/parquet-format/#date-value-auto-correction

> Parquet File : Date is stored wrongly
> -
>
> Key: DRILL-4203
> URL: https://issues.apache.org/jira/browse/DRILL-4203
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Stéphane Trou
>Assignee: Vitalii Diravka
>Priority: Critical
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> ++-+
> |  name  | epoch_date  |
> ++-+
> | Epoch  | 1970-01-01  |
> ++-+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:file:/tmp/buggy_parquet/0_0_0.parquet 
> creator: parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:   drill.version = 1.4.0 
> file schema: root 
> 
> name:OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> 
> name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}
> Implementation:
> After the fix Drill can automatically determine date corruption in parquet 
> files 
> and convert it to correct values.
> For the reason, when the user want to work with the dates over the 5 000 
> years,
> an option is included to turn off the auto-correction.
> Use of this option is assumed to be extremely unlikely, but it is included for
> completeness.
> To disable "auto correction" you should use the parquet config in the plugin 
> settings. Something like this:
> {code}
>   "formats": {
> "parquet": {
>   "type": "parquet",
>   "autoCorrectCorruptDates": false
> }
> {code}
> Or you can try to use the query like this:
> {code}
> select l_shipdate, l_commitdate from 
> table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`
>  
> (type => 'parquet', autoCorrectCorruptDates => false)) limit 1;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2017-03-31 Thread Bridget Bevens (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951764#comment-15951764
 ] 

Bridget Bevens commented on DRILL-4373:
---

Link to doc: http://drill.apache.org/docs/parquet-format/#about-int96-support

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it 
> by system / session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old 
> query scripts with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to 
> the query fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5031) Documentation for HTTPD Parser

2017-03-31 Thread Bridget Bevens (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951703#comment-15951703
 ] 

Bridget Bevens commented on DRILL-5031:
---

Moved this content based on feedback from Abhishek Girish.
New home is here: http://drill.apache.org/docs/httpd-storage-plugin/





> Documentation for HTTPD Parser
> --
>
> Key: DRILL-5031
> URL: https://issues.apache.org/jira/browse/DRILL-5031
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Charles Givre
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> https://gist.github.com/cgivre/47f07a06d44df2af625fc6848407ae7c



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-4974) NPE in FindPartitionConditions.analyzeCall() for 'holistic' expressions

2017-03-31 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-4974.
--

Verified and test cases added to automation.

> NPE in FindPartitionConditions.analyzeCall() for 'holistic' expressions
> ---
>
> Key: DRILL-4974
> URL: https://issues.apache.org/jira/browse/DRILL-4974
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.6.0, 1.7.0, 1.8.0
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
> Fix For: 1.9.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The following query can cause an NPE in FindPartitionConditions.analyzeCall() 
> if the fileSize column is a partitioned column. 
> SELECT  fileSize FROM dfs.`/drill-data/data/` WHERE compoundId LIKE 
> 'FOO-1234567%'
> This is because, the LIKE is treated as a holistic expression in 
> FindPartitionConditions.analyzeCall(), causing opStack to be empty, thus 
> causing opStack.peek() to return a NULL value.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4974) NPE in FindPartitionConditions.analyzeCall() for 'holistic' expressions

2017-03-31 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal updated DRILL-4974:
---
Reviewer: Krystal  (was: Chun Chang)

> NPE in FindPartitionConditions.analyzeCall() for 'holistic' expressions
> ---
>
> Key: DRILL-4974
> URL: https://issues.apache.org/jira/browse/DRILL-4974
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.6.0, 1.7.0, 1.8.0
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
> Fix For: 1.9.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The following query can cause an NPE in FindPartitionConditions.analyzeCall() 
> if the fileSize column is a partitioned column. 
> SELECT  fileSize FROM dfs.`/drill-data/data/` WHERE compoundId LIKE 
> 'FOO-1234567%'
> This is because, the LIKE is treated as a holistic expression in 
> FindPartitionConditions.analyzeCall(), causing opStack to be empty, thus 
> causing opStack.peek() to return a NULL value.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4280) Kerberos Authentication

2017-03-31 Thread Chun Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951665#comment-15951665
 ] 

Chun Chang commented on DRILL-4280:
---

This bug should cover the following:

"Drill should support Kerberos based authentication from clients. This means 
that both the ODBC and JDBC drivers as well as the web/REST interfaces should 
support inbound Kerberos. For Web this would most likely be SPNEGO while for 
ODBC and JDBC this will be more generic Kerberos."

Testing on all area: web/REST, SPNEGO, ODBC and JDBC are on going.

> Kerberos Authentication
> ---
>
> Key: DRILL-4280
> URL: https://issues.apache.org/jira/browse/DRILL-4280
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: security
> Fix For: 1.10.0
>
>
> Drill should support Kerberos based authentication from clients. This means 
> that both the ODBC and JDBC drivers as well as the web/REST interfaces should 
> support inbound Kerberos. For Web this would most likely be SPNEGO while for 
> ODBC and JDBC this will be more generic Kerberos.
> Since Hive and much of Hadoop supports Kerberos there is a potential for a 
> lot of reuse of ideas if not implementation.
> Note that this is related to but not the same as 
> https://issues.apache.org/jira/browse/DRILL-3584 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-4987) Use ImpersonationUtil in RemoteFunctionRegistry

2017-03-31 Thread Chun Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang closed DRILL-4987.
-

Covered by impersonation testing.

> Use ImpersonationUtil in RemoteFunctionRegistry
> ---
>
> Key: DRILL-4987
> URL: https://issues.apache.org/jira/browse/DRILL-4987
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>Priority: Minor
> Fix For: 1.10.0
>
>
> + Use ImpersonationUtil#getProcessUserName rather than  
> UserGroupInformation#getCurrentUser#getUserName in RemoteFunctionRegistry
> + Expose process users' group info in ImpersonationUtil and use that in 
> RemoteFunctionRegistry, rather than 
> UserGroupInformation#getCurrentUser#getGroupNames



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-5098) Improving fault tolerance for connection between client and foreman node.

2017-03-31 Thread Chun Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang closed DRILL-5098.
-

Verified with manual testing. Automation framework is not suited for this type 
of test. And we have extensive unit tests coverage for this feature.

> Improving fault tolerance for connection between client and foreman node.
> -
>
> Key: DRILL-5098
> URL: https://issues.apache.org/jira/browse/DRILL-5098
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> With DRILL-5015 we allowed support for specifying multiple Drillbits in 
> connection string and randomly choosing one out of it. Over time some of the 
> Drillbits specified in the connection string may die and the client can fail 
> to connect to Foreman node if random selection happens to be of dead Drillbit.
> Even if ZooKeeper is used for selecting a random Drillbit from the registered 
> one there is a small window when client selects one Drillbit and then that 
> Drillbit went down. The client will fail to connect to this Drillbit and 
> error out. 
> Instead if we try multiple Drillbits (configurable tries count through 
> connection string) then the probability of hitting this error window will 
> reduce in both the cases improving fault tolerance. During further 
> investigation it was also found that if there is Authentication failure then 
> we throw that error as generic RpcException. We need to improve that as well 
> to capture this case explicitly since in case of Auth failure we don't want 
> to try multiple Drillbits.
> Connection string example with new parameter:
> jdbc:drill:drillbit=[:][,[:]...;tries=5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-5121) A memory leak is observed when exact case is not specified for a column in a filter condition

2017-03-31 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-5121.
--

Verified and test cases added to automation.

> A memory leak is observed when exact case is not specified for a column in a 
> filter condition
> -
>
> Key: DRILL-5121
> URL: https://issues.apache.org/jira/browse/DRILL-5121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.6.0, 1.8.0
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When the query SELECT XYZ from dfs.`/tmp/foo' where xYZ like "abc", is 
> executed on a setup where /tmp/foo has 2 Parquet files, 1.parquet and 
> 2.parquet, where 1.parquet has the column XYZ but 2.parquet does not, then 
> there is a memory leak. 
> This seems to happen because xYZ seem to be treated as a new column. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5121) A memory leak is observed when exact case is not specified for a column in a filter condition

2017-03-31 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal updated DRILL-5121:
---
Reviewer: Krystal  (was: Chun Chang)

> A memory leak is observed when exact case is not specified for a column in a 
> filter condition
> -
>
> Key: DRILL-5121
> URL: https://issues.apache.org/jira/browse/DRILL-5121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.6.0, 1.8.0
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When the query SELECT XYZ from dfs.`/tmp/foo' where xYZ like "abc", is 
> executed on a setup where /tmp/foo has 2 Parquet files, 1.parquet and 
> 2.parquet, where 1.parquet has the column XYZ but 2.parquet does not, then 
> there is a memory leak. 
> This seems to happen because xYZ seem to be treated as a new column. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts

2017-03-31 Thread Padma Penumarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951555#comment-15951555
 ] 

Padma Penumarthy commented on DRILL-5394:
-

Yes, I did. Thanks for the review [~gparai]

> Optimize query planning for MapR-DB tables by caching row counts
> 
>
> Key: DRILL-5394
> URL: https://issues.apache.org/jira/browse/DRILL-5394
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary, ready-to-commit
> Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was 
> longer than expected. With DEBUG logs, it was understood that there were 
> multiple calls being made to get MapR-DB region locations and to fetch total 
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query 
> planning. This should help reduce query planning time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5405) Add missing operator types

2017-03-31 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5405:
---

Assignee: Arina Ielchiieva  (was: Zelaine Fong)

> Add missing operator types
> --
>
> Key: DRILL-5405
> URL: https://issues.apache.org/jira/browse/DRILL-5405
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG
>
>
> Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they 
> won't be displayed on Web UI as UNKNOWN_OPERATOR.
> Example:
> before the fix -> unknown_operator.JPG
> after the fix -> maprdb_sub_scan.JPG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5405) Add missing operator types

2017-03-31 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-5405:
---

Assignee: Zelaine Fong  (was: Arina Ielchiieva)

> Add missing operator types
> --
>
> Key: DRILL-5405
> URL: https://issues.apache.org/jira/browse/DRILL-5405
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Zelaine Fong
>Priority: Minor
> Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG
>
>
> Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they 
> won't be displayed on Web UI as UNKNOWN_OPERATOR.
> Example:
> before the fix -> unknown_operator.JPG
> after the fix -> maprdb_sub_scan.JPG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (DRILL-5031) Documentation for HTTPD Parser

2017-03-31 Thread Bridget Bevens (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens resolved DRILL-5031.
---
   Resolution: Fixed
Fix Version/s: (was: 1.9.0)
   1.10.0

Added minor edits and moved the content into Apache Drill: 
http://drill.apache.org/docs/configuring-drill-to-read-web-server-logs/ 

Please let me know if you see any issues. 

Thanks,
Bridget

> Documentation for HTTPD Parser
> --
>
> Key: DRILL-5031
> URL: https://issues.apache.org/jira/browse/DRILL-5031
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Charles Givre
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> https://gist.github.com/cgivre/47f07a06d44df2af625fc6848407ae7c



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts

2017-03-31 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951400#comment-15951400
 ] 

Gautam Kumar Parai commented on DRILL-5394:
---

[~ppenumarthy] is the code ready to go in Apache? If so , then we should mark 
it as with ready-to-commit tag.

> Optimize query planning for MapR-DB tables by caching row counts
> 
>
> Key: DRILL-5394
> URL: https://issues.apache.org/jira/browse/DRILL-5394
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary, ready-to-commit
> Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was 
> longer than expected. With DEBUG logs, it was understood that there were 
> multiple calls being made to get MapR-DB region locations and to fetch total 
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query 
> planning. This should help reduce query planning time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5406) Flatten produces a random ClassCastException

2017-03-31 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951316#comment-15951316
 ] 

Rahul Challapalli commented on DRILL-5406:
--

Another instance of the below query failing. This time the stacktrace shows the 
issue happened in the JDBC code
{code}
java.sql.SQLException: SYSTEM ERROR: ClassCastException

Fragment 0:0

[Error Id: 3ef91b70-debf-4e32-a3a0-39010fb42460 on qa-node183.qa.lab:31010]

  (java.lang.ClassCastException) null

at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:232)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:275)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1943)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:76)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:465)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:169)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR: ClassCastException

Fragment 0:0

[Error Id: 3ef91b70-debf-4e32-a3a0-39010fb42460 on qa-node183.qa.lab:31010]

  (java.lang.ClassCastException) null

at 
oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:144)
at 
oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
at 
oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:65)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:363)
at 
oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:240)
at 
oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:245)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 

[jira] [Commented] (DRILL-5404) kvgen function only supports Simple maps as input

2017-03-31 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951301#comment-15951301
 ] 

Rahul Challapalli commented on DRILL-5404:
--

This is reproducible on drill 1.9.0 as well with the below query on the same 
data set
{code}
select kvgen(bigintegercol), kvgen(float8col) from 
`json_kvgenflatten/kvgen1.json`
{code}

> kvgen function only supports Simple maps as input
> -
>
> Key: DRILL-5404
> URL: https://issues.apache.org/jira/browse/DRILL-5404
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=38ef562
> The below query did not fail when I ran it in isolation. However when I ran 
> the test suite at [1], which also contains the below query, by using 50 
> threads submitting queries concurrently, I hit the below error. 
> {code}
> select boolcol, bigintegercol, varcharcol, kvgen(bigintegercol), 
> kvgen(boolcol), kvgen(varcharcol) from `json_kvgenflatten/kvgen1.json`
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: kvgen function 
> only supports Simple maps as input
> Fragment 0:0
> [Error Id: 953541c2-cf67-4d29-8d1c-ac3ff3c18f1f on qa-node182.qa.lab:31010]
>   (org.apache.drill.common.exceptions.DrillRuntimeException) kvgen function 
> only supports Simple maps as input
> org.apache.drill.exec.expr.fn.impl.MappifyUtility.mappify():46
> org.apache.drill.exec.test.generated.ProjectorGen10361.doEval():45
> org.apache.drill.exec.test.generated.ProjectorGen10361.projectRecords():67
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():199
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
> at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
> at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
> at 
> oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
> at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
> at 
> oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
> at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180)
> at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
> at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
> at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
> at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177)
> at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> 

[jira] [Commented] (DRILL-5406) Flatten produces a random ClassCastException

2017-03-31 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951292#comment-15951292
 ] 

Rahul Challapalli commented on DRILL-5406:
--

Data set used in the query :
{code}
{"map":{"rm": [ {"rptd": [{ "a": "foo"}]}]}}|10
{code}

> Flatten produces a random ClassCastException
> 
>
> Key: DRILL-5406
> URL: https://issues.apache.org/jira/browse/DRILL-5406
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.9.0
>Reporter: Rahul Challapalli
>
> I hit a random error on drill 1.9.0. I will try to reproduce the issue on the 
> latest master.
> The below query did not fail when I ran it in isolation. However when I ran 
> the test suite at [1], which also contains the below query, by using 50 
> threads submitting queries concurrently, I hit the below error.
> {code}
> select flatten(convert_from(columns[0], 'JSON')) from 
> `json_kvgenflatten/convert4783_2.tbl` where 1=2
> [Error Id: 1b5f4aef-ae34-4af4-9f2f-8349f8dd97c2 on qa-node183.qa.lab:31010]
>   (java.lang.ClassCastException) 
> org.apache.drill.common.expression.TypedNullConstant cannot be cast to 
> org.apache.drill.exec.expr.ValueVectorReadExpression
> 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema():307
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext():120
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
> at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
> at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:144)
> at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
> at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
> at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:65)
> at 
> oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:363)
> at 
> oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
> at 
> oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:240)
> at 
> oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
> at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
> at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:245)
> at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
> at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
> at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
> at 
> 

[jira] [Created] (DRILL-5406) Flatten produces a random ClassCastException

2017-03-31 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-5406:


 Summary: Flatten produces a random ClassCastException
 Key: DRILL-5406
 URL: https://issues.apache.org/jira/browse/DRILL-5406
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.9.0
Reporter: Rahul Challapalli


I hit a random error on drill 1.9.0. I will try to reproduce the issue on the 
latest master.

The below query did not fail when I ran it in isolation. However when I ran the 
test suite at [1], which also contains the below query, by using 50 threads 
submitting queries concurrently, I hit the below error.
{code}
select flatten(convert_from(columns[0], 'JSON')) from 
`json_kvgenflatten/convert4783_2.tbl` where 1=2

[Error Id: 1b5f4aef-ae34-4af4-9f2f-8349f8dd97c2 on qa-node183.qa.lab:31010]

  (java.lang.ClassCastException) 
org.apache.drill.common.expression.TypedNullConstant cannot be cast to 
org.apache.drill.exec.expr.ValueVectorReadExpression

org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema():307
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78

org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext():120
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745

at 
oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:144)
at 
oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
at 
oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:65)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:363)
at 
oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:240)
at 
oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:245)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951210#comment-15951210
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/794
  
A few follow-up comments which are non-blockers.  Overall LGTM.  +1


> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951207#comment-15951207
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/794#discussion_r109197541
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
@@ -105,6 +103,29 @@
   public static final PositiveLongValidator 
PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD = new 
PositiveLongValidator(PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD_KEY,
   Long.MAX_VALUE, 1);
 
+  /*
+ Enables rules that re-write query joins in the most optimal way.
+ Though its turned on be default and its value in query optimization 
is undeniable, user may want turn off such
+ optimization to leave join order indicated in sql query unchanged.
+
+  For example:
+  Currently only nested loop join allows non-equi join conditions 
usage.
+  During planning stage nested loop join will be chosen when non-equi 
join is detected
+  and {@link #NLJOIN_FOR_SCALAR} set to false. Though query 
performance may not be the most optimal in such case,
+  user may use such workaround to execute queries with non-equi joins.
+
+  Nested loop join allows only INNER and LEFT join usage and implies 
that right input is smaller that left input.
+  During LEFT join when join optimization is enabled and detected that 
right input is larger that left,
+  join will be optimized: left and right inputs will be flipped and 
LEFT join type will be changed to RIGHT one.
+  If query contains non-equi joins, after such optimization it will 
fail, since nested loop does not allow
+  RIGHT join. In this case if user accepts probability of non optimal 
performance, he may turn off join optimization.
+  Turning off join optimization, makes sense only if user are not sure 
that right output is less or equal to left,
+  otherwise join optimization can be left turned on.
+
+  Note: once hash and merge joins will allow non-equi join conditions,
+  the need to turn off join optimization may go away.
+   */
+  public static final BooleanValidator JOIN_OPTIMIZATION = new 
BooleanValidator("planner.enable_join_optimization", true);
--- End diff --

Ah, you added this option to enable/disable the *logical* join rules.  
Since NestedLoopJoin is a physical join implementation, from the comments I 
interpreted that this was intended for the swapping of left and right inputs of 
the (physical) NL join, which is why I mentioned about hashjoin_swap option.   
It seems to me that if there is an LEFT OUTER JOIN and condition is 
non-equality, then we should not allow changing to a Right Outer Join by 
flipping the left and right sides, since that would make the query fail.   What 
do you think ?
I suppose we could keep your boolean option for this PR and address the 
left outer join issue separately.  


> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set 

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951205#comment-15951205
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/794#discussion_r109193083
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java
 ---
@@ -214,26 +226,62 @@ private boolean hasMore(IterOutcome outcome) {
 
   /**
* Method generates the runtime code needed for NLJ. Other than the 
setup method to set the input and output value
-   * vector references we implement two more methods
-   * 1. emitLeft()  -> Project record from the left side
-   * 2. emitRight() -> Project record from the right side (which is a 
hyper container)
+   * vector references we implement three more methods
+   * 1. doEval() -> Evaluates if record from left side matches record from 
the right side
+   * 2. emitLeft() -> Project record from the left side
+   * 3. emitRight() -> Project record from the right side (which is a 
hyper container)
* @return the runtime generated class that implements the 
NestedLoopJoin interface
-   * @throws IOException
-   * @throws ClassTransformationException
*/
-  private NestedLoopJoin setupWorker() throws IOException, 
ClassTransformationException {
-final CodeGenerator nLJCodeGenerator = 
CodeGenerator.get(NestedLoopJoin.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private NestedLoopJoin setupWorker() throws IOException, 
ClassTransformationException, SchemaChangeException {
+final CodeGenerator nLJCodeGenerator = 
CodeGenerator.get(
+NestedLoopJoin.TEMPLATE_DEFINITION, context.getFunctionRegistry(), 
context.getOptions());
 nLJCodeGenerator.plainJavaCapable(true);
 // Uncomment out this line to debug the generated code.
 //nLJCodeGenerator.saveCodeForDebugging(true);
 final ClassGenerator nLJClassGenerator = 
nLJCodeGenerator.getRoot();
 
+// generate doEval
+final ErrorCollector collector = new ErrorCollectorImpl();
+
+
+/*
+Logical expression may contain fields from left and right batches. 
During code generation (materialization)
+we need to indicate from which input field should be taken. 
Mapping sets can work with only one input at a time.
+But non-equality expressions can be complex:
+  select t1.c1, t2.c1, t2.c2 from t1 inner join t2 on t1.c1 
between t2.c1 and t2.c2
+or even contain self join which can not be transformed into filter 
since OR clause is present
+  select *from t1 inner join t2 on t1.c1 >= t2.c1 or t1.c3 <> t1.c4
+
+In this case logical expression can not be split according to 
input presence (like during equality joins
--- End diff --

To avoid confusion you could list couple of example categories:  
1. Join on non-equijoin predicates:  t1 inner join t2 on  (t1.c1 between 
t2.c1 AND t2.c2) AND (...) 
2. Join with an OR predicate: t1 inner join t2 on on t1.c1 = t2.c1 OR t1.c2 
= t2.c2

The other category where a join predicate includes self-join could probably 
be left out since there are quite a few variations there - if there are 2 
tables but the join condition only specifies 1 table, then it would be a 
cartesian join with the second table. If the self join occurs in combination 
with an AND it would be treated differently compared with OR etc..


> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt   

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951206#comment-15951206
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/794#discussion_r109186694
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ---
@@ -70,27 +70,65 @@
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillOptiq.class);
 
   /**
-   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax.
+   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax using one input.
+   *
+   * @param context parse context which contains planner settings
+   * @param input data input
+   * @param expr expression to be converted
+   * @return converted expression
*/
   public static LogicalExpression toDrill(DrillParseContext context, 
RelNode input, RexNode expr) {
-final RexToDrill visitor = new RexToDrill(context, input);
+return toDrill(context, Lists.newArrayList(input), expr);
+  }
+
+  /**
+   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax using multiple inputs.
+   *
+   * @param context parse context which contains planner settings
+   * @param inputs multiple data inputs
+   * @param expr expression to be converted
+   * @return converted expression
+   */
+  public static LogicalExpression toDrill(DrillParseContext context, 
List inputs, RexNode expr) {
+final RexToDrill visitor = new RexToDrill(context, inputs);
 return expr.accept(visitor);
   }
 
   private static class RexToDrill extends 
RexVisitorImpl {
-private final RelNode input;
+private final List inputs;
 private final DrillParseContext context;
+private final List fieldList;
 
-RexToDrill(DrillParseContext context, RelNode input) {
+RexToDrill(DrillParseContext context, List inputs) {
   super(true);
   this.context = context;
-  this.input = input;
+  this.inputs = inputs;
+  this.fieldList = Lists.newArrayList();
+  /*
+ Fields are enumerated by their presence order in input. Details 
{@link org.apache.calcite.rex.RexInputRef}.
+ Thus we can merge field list from several inputs by adding them 
into the list in order of appearance.
+ Each field index in the list will match field index in the 
RexInputRef instance which will allow us
+ to retrieve field from filed list by index in {@link 
#visitInputRef(RexInputRef)} method. Example:
+
+ Query: select t1.c1, t2.c1. t2.c2 from t1 inner join t2 on t1.c1 
between t2.c1 and t2.c2
+
+ Input 1: $0
+ Input 2: $1, $2
+
+ Result: $0, $1, $2
+   */
+  for (RelNode input : inputs) {
--- End diff --

Ok, I see.  Performance-wise it is a minor thing but It is more about 
working with the existing visitInputRef() which takes one input. 


> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> 

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951204#comment-15951204
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/794#discussion_r109193949
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java
 ---
@@ -40,132 +41,133 @@
   // Record count of the left batch currently being processed
   private int leftRecordCount = 0;
 
-  // List of record counts  per batch in the hyper container
+  // List of record counts per batch in the hyper container
   private List rightCounts = null;
 
   // Output batch
   private NestedLoopJoinBatch outgoing = null;
 
-  // Next right batch to process
-  private int nextRightBatchToProcess = 0;
-
-  // Next record in the current right batch to process
-  private int nextRightRecordToProcess = 0;
-
-  // Next record in the left batch to process
-  private int nextLeftRecordToProcess = 0;
+  // Iteration status tracker
+  private IterationStatusTracker tracker = new IterationStatusTracker();
 
   /**
* Method initializes necessary state and invokes the doSetup() to set 
the
-   * input and output value vector references
+   * input and output value vector references.
+   *
* @param context Fragment context
* @param left Current left input batch being processed
* @param rightContainer Hyper container
+   * @param rightCounts Counts for each right container
* @param outgoing Output batch
*/
-  public void setupNestedLoopJoin(FragmentContext context, RecordBatch 
left,
+  public void setupNestedLoopJoin(FragmentContext context,
+  RecordBatch left,
   ExpandableHyperContainer rightContainer,
   LinkedList rightCounts,
   NestedLoopJoinBatch outgoing) {
 this.left = left;
-leftRecordCount = left.getRecordCount();
+this.leftRecordCount = left.getRecordCount();
 this.rightCounts = rightCounts;
 this.outgoing = outgoing;
 
 doSetup(context, rightContainer, left, outgoing);
   }
 
   /**
-   * This method is the core of the nested loop join. For every record on 
the right we go over
-   * the left batch and produce the cross product output
+   * Main entry point for producing the output records. Thin wrapper 
around populateOutgoingBatch(), this method
+   * controls which left batch we are processing and fetches the next left 
input batch once we exhaust the current one.
+   *
+   * @param joinType join type (INNER ot LEFT)
+   * @return the number of records produced in the output batch
+   */
+  public int outputRecords(JoinRelType joinType) {
+int outputIndex = 0;
+while (leftRecordCount != 0) {
+  outputIndex = populateOutgoingBatch(joinType, outputIndex);
+  if (outputIndex >= NestedLoopJoinBatch.MAX_BATCH_SIZE) {
+break;
+  }
+  // reset state and get next left batch
+  resetAndGetNextLeft();
+}
+return outputIndex;
+  }
+
+  /**
+   * This method is the core of the nested loop join.For each left batch 
record looks for matching record
+   * from the list of right batches. Match is checked by calling {@link 
#doEval(int, int, int)} method.
+   * If matching record is found both left and right records are written 
into output batch,
+   * otherwise if join type is LEFT, than only left record is written, 
right batch record values will be null.
+   *
+   * @param joinType join type (INNER or LEFT)
* @param outputIndex index to start emitting records at
* @return final outputIndex after producing records in the output batch
*/
-  private int populateOutgoingBatch(int outputIndex) {
-
-// Total number of batches on the right side
-int totalRightBatches = rightCounts.size();
-
-// Total number of records on the left
-int localLeftRecordCount = leftRecordCount;
-
-/*
- * The below logic is the core of the NLJ. To have better performance 
we copy the instance members into local
- * method variables, once we are done with the loop we need to update 
the instance variables to reflect the new
- * state. To avoid code duplication of resetting the instance members 
at every exit point in the loop we are using
- * 'goto'
- */
-int localNextRightBatchToProcess = nextRightBatchToProcess;
-int localNextRightRecordToProcess = nextRightRecordToProcess;
- 

[jira] [Updated] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts

2017-03-31 Thread Padma Penumarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-5394:

Labels: MapR-DB-Binary ready-to-commit  (was: MapR-DB-Binary)

> Optimize query planning for MapR-DB tables by caching row counts
> 
>
> Key: DRILL-5394
> URL: https://issues.apache.org/jira/browse/DRILL-5394
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary, ready-to-commit
> Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was 
> longer than expected. With DEBUG logs, it was understood that there were 
> multiple calls being made to get MapR-DB region locations and to fetch total 
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query 
> planning. This should help reduce query planning time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5405) Add missing operator types

2017-03-31 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5405:

Reviewer: Karthikeyan Manivannan

Assigned Reviewer to [~karthikm]

> Add missing operator types
> --
>
> Key: DRILL-5405
> URL: https://issues.apache.org/jira/browse/DRILL-5405
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG
>
>
> Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they 
> won't be displayed on Web UI as UNKNOWN_OPERATOR.
> Example:
> before the fix -> unknown_operator.JPG
> after the fix -> maprdb_sub_scan.JPG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5405) Add missing operator types

2017-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950693#comment-15950693
 ] 

ASF GitHub Bot commented on DRILL-5405:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/804

DRILL-5405: Add missing operator types



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-5405

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/804.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #804


commit 91ccb9c539f1c20d73b5cae9cb101c18b8f0cb73
Author: Arina Ielchiieva 
Date:   2017-03-30T16:55:31Z

DRILL-5405: Add missing operator types




> Add missing operator types
> --
>
> Key: DRILL-5405
> URL: https://issues.apache.org/jira/browse/DRILL-5405
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG
>
>
> Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they 
> won't be displayed on Web UI as UNKNOWN_OPERATOR.
> Example:
> before the fix -> unknown_operator.JPG
> after the fix -> maprdb_sub_scan.JPG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-03-31 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-3562.
-

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5401) isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON

2017-03-31 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950643#comment-15950643
 ] 

Khurram Faraaz commented on DRILL-5401:
---

The SQL was incorrect in the above example, fixing the SQL results in 
SchemaChangeException

{noformat}
0: jdbc:drill:schema=dfs.tmp> select t.a.b.c from `empty_array.json` t where 
t.a.b.c is not null;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function implementation: 
[isnotnull(MAP-REPEATED)].  Full expression: --UNKNOWN EXPRESSION--..

Fragment 0:0

[Error Id: e1b65f30-6f40-43f4-8162-9cb54d6f5a81 on centos-01.qa.lab:31010] 
(state=,code=0)
0: jdbc:drill:schema=dfs.tmp> select t.a.b.c from `empty_array.json` t where 
t.a.b.c is null;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function implementation: 
[isnull(MAP-REPEATED)].  Full expression: --UNKNOWN EXPRESSION--..

Fragment 0:0

[Error Id: c964c93d-0573-4598-a3ee-6d8abc3abff0 on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

> isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON
> ---
>
> Key: DRILL-5401
> URL: https://issues.apache.org/jira/browse/DRILL-5401
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Khurram Faraaz
>
> Checking if a list is null or if it is not null, results in 
> SchemaChangeException.
> Drill 1.11.0 commit id: adbf363d
> Data used in test
> {noformat}
> [root@centos-01 ~]# cat empty_array.json
> { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } }
> { "a": { "b": { "c": [] } } }
> {noformat}
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> alter session set 
> `store.json.all_text_mode`=true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | store.json.all_text_mode updated.  |
> +---++
> 1 row selected (0.189 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json`;
> ++
> |   a|
> ++
> | {"b":{"c":[{"d":{"e":"f"}}]}}  |
> | {"b":{"c":[]}} |
> ++
> 2 rows selected (0.138 seconds)
> /* wrong results */
> 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c 
> IS NULL;
> ++
> |   a|
> ++
> | {"b":{"c":[{"d":{"e":"f"}}]}}  |
> | {"b":{"c":[]}} |
> ++
> 2 rows selected (0.152 seconds)
> /* wrong results */
> 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c 
> IS NOT NULL;
> ++
> | a  |
> ++
> ++
> No rows selected (0.154 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5401) isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON

2017-03-31 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-5401:
--
Summary: isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in 
JSON  (was: wrong results - IS NULL / IS NOT NULL over a list in JSON)

> isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON
> ---
>
> Key: DRILL-5401
> URL: https://issues.apache.org/jira/browse/DRILL-5401
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Khurram Faraaz
>
> Checking if a list is null or if it is not null, results in incorrect results.
> Drill 1.11.0 commit id: adbf363d
> Data used in test
> {noformat}
> [root@centos-01 ~]# cat empty_array.json
> { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } }
> { "a": { "b": { "c": [] } } }
> {noformat}
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> alter session set 
> `store.json.all_text_mode`=true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | store.json.all_text_mode updated.  |
> +---++
> 1 row selected (0.189 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json`;
> ++
> |   a|
> ++
> | {"b":{"c":[{"d":{"e":"f"}}]}}  |
> | {"b":{"c":[]}} |
> ++
> 2 rows selected (0.138 seconds)
> /* wrong results */
> 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c 
> IS NULL;
> ++
> |   a|
> ++
> | {"b":{"c":[{"d":{"e":"f"}}]}}  |
> | {"b":{"c":[]}} |
> ++
> 2 rows selected (0.152 seconds)
> /* wrong results */
> 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c 
> IS NOT NULL;
> ++
> | a  |
> ++
> ++
> No rows selected (0.154 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5401) isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON

2017-03-31 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-5401:
--
Description: 
Checking if a list is null or if it is not null, results in 
SchemaChangeException.
Drill 1.11.0 commit id: adbf363d

Data used in test

{noformat}
[root@centos-01 ~]# cat empty_array.json
{ "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } }
{ "a": { "b": { "c": [] } } }
{noformat}

{noformat}
0: jdbc:drill:schema=dfs.tmp> alter session set `store.json.all_text_mode`=true;
+---++
|  ok   |  summary   |
+---++
| true  | store.json.all_text_mode updated.  |
+---++
1 row selected (0.189 seconds)
0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json`;
++
|   a|
++
| {"b":{"c":[{"d":{"e":"f"}}]}}  |
| {"b":{"c":[]}} |
++
2 rows selected (0.138 seconds)

/* wrong results */

0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c IS 
NULL;
++
|   a|
++
| {"b":{"c":[{"d":{"e":"f"}}]}}  |
| {"b":{"c":[]}} |
++
2 rows selected (0.152 seconds)

/* wrong results */

0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c IS 
NOT NULL;
++
| a  |
++
++
No rows selected (0.154 seconds)
{noformat}

  was:
Checking if a list is null or if it is not null, results in incorrect results.
Drill 1.11.0 commit id: adbf363d

Data used in test

{noformat}
[root@centos-01 ~]# cat empty_array.json
{ "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } }
{ "a": { "b": { "c": [] } } }
{noformat}

{noformat}
0: jdbc:drill:schema=dfs.tmp> alter session set `store.json.all_text_mode`=true;
+---++
|  ok   |  summary   |
+---++
| true  | store.json.all_text_mode updated.  |
+---++
1 row selected (0.189 seconds)
0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json`;
++
|   a|
++
| {"b":{"c":[{"d":{"e":"f"}}]}}  |
| {"b":{"c":[]}} |
++
2 rows selected (0.138 seconds)

/* wrong results */

0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c IS 
NULL;
++
|   a|
++
| {"b":{"c":[{"d":{"e":"f"}}]}}  |
| {"b":{"c":[]}} |
++
2 rows selected (0.152 seconds)

/* wrong results */

0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c IS 
NOT NULL;
++
| a  |
++
++
No rows selected (0.154 seconds)
{noformat}


> isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON
> ---
>
> Key: DRILL-5401
> URL: https://issues.apache.org/jira/browse/DRILL-5401
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Khurram Faraaz
>
> Checking if a list is null or if it is not null, results in 
> SchemaChangeException.
> Drill 1.11.0 commit id: adbf363d
> Data used in test
> {noformat}
> [root@centos-01 ~]# cat empty_array.json
> { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } }
> { "a": { "b": { "c": [] } } }
> {noformat}
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> alter session set 
> `store.json.all_text_mode`=true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | store.json.all_text_mode updated.  |
> +---++
> 1 row selected (0.189 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json`;
> ++
> |   a|
> ++
> | {"b":{"c":[{"d":{"e":"f"}}]}}  |
> | {"b":{"c":[]}} |
> ++
> 2 rows selected (0.138 seconds)
> /* wrong results */
> 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c 
> IS NULL;
> ++
> |   a|
> ++
> | {"b":{"c":[{"d":{"e":"f"}}]}}  |
> | {"b":{"c":[]}} |
> ++
> 2 rows selected (0.152 seconds)
> /* wrong results */

[jira] [Comment Edited] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-03-31 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950596#comment-15950596
 ] 

Khurram Faraaz edited comment on DRILL-3562 at 3/31/17 9:32 AM:


[~arina] thanks for confirming. Verified that SQL reported in this JIRA returns 
correct results on Drill 1.11.0
Test added here framework/resources/Functional/json/json_storage/drill_3562.q

{noformat}
0: jdbc:drill:schema=dfs.tmp> select count(*) from (select FLATTEN(t.a.b.c) AS 
c from `empty_array.json` t) flat WHERE flat.c.d.e = 'f' ;
+-+
| EXPR$0  |
+-+
| 1   |
+-+
1 row selected (0.241 seconds)
{noformat}


was (Author: khfaraaz):
[~arina] thanks for confirming. Verified that SQL reported in this JIRA returns 
correct results on Drill 1.10.0
Test added here framework/resources/Functional/json/json_storage/drill_3562.q

{noformat}
0: jdbc:drill:schema=dfs.tmp> select count(*) from (select FLATTEN(t.a.b.c) AS 
c from `empty_array.json` t) flat WHERE flat.c.d.e = 'f' ;
+-+
| EXPR$0  |
+-+
| 1   |
+-+
1 row selected (0.241 seconds)
{noformat}

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-03-31 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950596#comment-15950596
 ] 

Khurram Faraaz commented on DRILL-3562:
---

[~arina] thanks for confirming. Verified that SQL reported in this JIRA returns 
correct results on Drill 1.10.0
Test added here framework/resources/Functional/json/json_storage/drill_3562.q

{noformat}
0: jdbc:drill:schema=dfs.tmp> select count(*) from (select FLATTEN(t.a.b.c) AS 
c from `empty_array.json` t) flat WHERE flat.c.d.e = 'f' ;
+-+
| EXPR$0  |
+-+
| 1   |
+-+
1 row selected (0.241 seconds)
{noformat}

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-03-31 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950582#comment-15950582
 ] 

Arina Ielchiieva commented on DRILL-3562:
-

Yes, it does. This behavior is expected in unit test 
TestJsonReader.testFlattenEmptyArrayWithAllTextMode.

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5405) Add missing operator types

2017-03-31 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-5405:
---

 Summary: Add missing operator types
 Key: DRILL-5405
 URL: https://issues.apache.org/jira/browse/DRILL-5405
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
Priority: Minor
 Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG

Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they 
won't be displayed on Web UI as UNKNOWN_OPERATOR.

Example:
before the fix -> unknown_operator.JPG
after the fix -> maprdb_sub_scan.JPG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5405) Add missing operator types

2017-03-31 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5405:

Attachment: maprdb_sub_scan.JPG
unknown_operator.JPG

> Add missing operator types
> --
>
> Key: DRILL-5405
> URL: https://issues.apache.org/jira/browse/DRILL-5405
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG
>
>
> Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they 
> won't be displayed on Web UI as UNKNOWN_OPERATOR.
> Example:
> before the fix -> unknown_operator.JPG
> after the fix -> maprdb_sub_scan.JPG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-03-31 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950452#comment-15950452
 ] 

Khurram Faraaz commented on DRILL-3562:
---

[~arina] Is this the expected result for the second SQL below ?

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `drill_3562.json`;
+-+
|a|
+-+
| {"b":{"c":[]}}  |
+-+
1 row selected (0.138 seconds)
0: jdbc:drill:schema=dfs.tmp> select FLATTEN(t.a.b.c) AS c from 
`drill_3562.json` t;
++
| c  |
++
++
No rows selected (0.181 seconds)
{noformat}

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)