[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203728#comment-15203728
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user hsuanyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/405#discussion_r56783075
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DirectScanPrule.java
 ---
@@ -0,0 +1,40 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.physical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.drill.exec.planner.logical.DrillDirectScanRel;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+
--- End diff --

java doc?


> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4006) As json reader reads a field with empty lists, IOOB could happen

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203726#comment-15203726
 ] 

ASF GitHub Bot commented on DRILL-4006:
---

Github user hsuanyi closed the pull request at:

https://github.com/apache/drill/pull/242


> As json reader reads a field with empty lists, IOOB could happen
> 
>
> Key: DRILL-4006
> URL: https://issues.apache.org/jira/browse/DRILL-4006
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.3.0
>
> Attachments: a.json, b.json, c.json
>
>
> If a field in a json file has many empty lists before a non-empty list, there 
> could be an IOOB exception.
> Running the following query on the folder with files in the attachment can 
> reproduce the observation:
> {code}
> select a from`folder`
> {code}
> Exception:
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: 
> index: 4448, length: 4 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4323) Hive Native Reader : A simple count(*) throws Incoming batch has an empty schema error

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203716#comment-15203716
 ] 

ASF GitHub Bot commented on DRILL-4323:
---

Github user hsuanyi closed the pull request at:

https://github.com/apache/drill/pull/349


> Hive Native Reader : A simple count(*) throws Incoming batch has an empty 
> schema error
> --
>
> Key: DRILL-4323
> URL: https://issues.apache.org/jira/browse/DRILL-4323
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.6.0
>
> Attachments: error.log
>
>
> git.commit.id.abbrev=3d0b4b0
> A simple count(*) query does not work when hive native reader is enabled
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from customer;
> +-+
> | EXPR$0  |
> +-+
> | 10  |
> +-+
> 1 row selected (3.074 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `store.hive.optimize_scan_with_native_readers` = true;
> +---++
> |  ok   |summary |
> +---++
> | true  | store.hive.optimize_scan_with_native_readers updated.  |
> +---++
> 1 row selected (0.2 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from customer;
> Error: SYSTEM ERROR: IllegalStateException: Incoming batch [#1341, 
> ProjectRecordBatch] has an empty schema. This is not allowed.
> Fragment 0:0
> [Error Id: 4c867440-0fd3-4eda-922f-0f5eadcb1463 on qa-node191.qa.lab:31010] 
> (state=,code=0)
> {code}
> Hive DDL for the table :
> {code}
> create table customer
> (
> c_customer_sk int,
> c_customer_id string,
> c_current_cdemo_sk int,
> c_current_hdemo_sk int,
> c_current_addr_sk int,
> c_first_shipto_date_sk int,
> c_first_sales_date_sk int,
> c_salutation string,
> c_first_name string,
> c_last_name string,
> c_preferred_cust_flag string,
> c_birth_day int,
> c_birth_month int,
> c_birth_year int,
> c_birth_country string,
> c_login string,
> c_email_address string,
> c_last_review_date string
> )
> STORED AS PARQUET
> LOCATION '/drill/testdata/customer'
> {code}
> Attached the log file with the stacktrace



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4490) Count(*) function returns as optional instead of required

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203714#comment-15203714
 ] 

ASF GitHub Bot commented on DRILL-4490:
---

Github user hsuanyi closed the pull request at:

https://github.com/apache/drill/pull/423


> Count(*) function returns as optional instead of required
> -
>
> Key: DRILL-4490
> URL: https://issues.apache.org/jira/browse/DRILL-4490
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Krystal
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> git.commit.id.abbrev=c8a7840
> I have the following CTAS query:
> create table test as select count(*) as col1 from cp.`tpch/orders.parquet`;
> The schema of the test table shows col1 as optional:
> message root {
>   optional int64 col1;
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4510) IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.dr

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203713#comment-15203713
 ] 

ASF GitHub Bot commented on DRILL-4510:
---

Github user hsuanyi closed the pull request at:

https://github.com/apache/drill/pull/433


> IllegalStateException: Failure while reading vector.  Expected vector class 
> of org.apache.drill.exec.vector.NullableIntVector but was holding vector 
> class org.apache.drill.exec.vector.NullableVarCharVector
> -
>
> Key: DRILL-4510
> URL: https://issues.apache.org/jira/browse/DRILL-4510
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Chun Chang
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
>
> Hit the following regression running advanced automation. Regression happened 
> between commit b979bebe83d7017880b0763adcbf8eb80acfcee8 and 
> 1f23b89623c72808f2ee866cec9b4b8a48929d68
> {noformat}
> Execution Failures:
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/original/query66.sql
> Query: 
> -- start query 66 in stream 0 using template query66.tpl 
> SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>w_city, 
>w_county, 
>w_state, 
>w_country, 
>ship_carriers, 
>year1,
>Sum(jan_sales) AS jan_sales, 
>Sum(feb_sales) AS feb_sales, 
>Sum(mar_sales) AS mar_sales, 
>Sum(apr_sales) AS apr_sales, 
>Sum(may_sales) AS may_sales, 
>Sum(jun_sales) AS jun_sales, 
>Sum(jul_sales) AS jul_sales, 
>Sum(aug_sales) AS aug_sales, 
>Sum(sep_sales) AS sep_sales, 
>Sum(oct_sales) AS oct_sales, 
>Sum(nov_sales) AS nov_sales, 
>Sum(dec_sales) AS dec_sales, 
>Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot, 
>Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot, 
>Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot, 
>Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot, 
>Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot, 
>Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot, 
>Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot, 
>Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot, 
>Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot, 
>Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot, 
>Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot, 
>Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot, 
>Sum(jan_net)   AS jan_net, 
>Sum(feb_net)   AS feb_net, 
>Sum(mar_net)   AS mar_net, 
>Sum(apr_net)   AS apr_net, 
>Sum(may_net)   AS may_net, 
>Sum(jun_net)   AS jun_net, 
>Sum(jul_net)   AS jul_net, 
>Sum(aug_net)   AS aug_net, 
>Sum(sep_net)   AS sep_net, 
>Sum(oct_net)   AS oct_net, 
>Sum(nov_net)   AS nov_net, 
>Sum(dec_net)   AS dec_net 
> FROM   (SELECT w_warehouse_name, 
>w_warehouse_sq_ft, 
>w_city, 
>w_county, 
>w_state, 
>w_country, 
>'ZOUROS' 
>|| ',' 
>|| 'ZHOU' AS ship_carriers, 
>d_yearAS year1, 
>Sum(CASE 
>  WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity 
>  ELSE 0 
>END)  AS jan_sales, 
>Sum(CASE 
>  WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity 
>  ELSE 0 
>END)  AS feb_sales, 
>Sum(CASE 
>  WHEN d_moy = 3 THEN 

[jira] [Updated] (DRILL-4514) Add describe schema command

2016-03-20 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4514:

Summary: Add describe schema  command  (was: Add describe 
database  command)

> Add describe schema  command
> -
>
> Key: DRILL-4514
> URL: https://issues.apache.org/jira/browse/DRILL-4514
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: Future
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>
> Add describe database  command which will return directory 
> associated with a database on the fly.
> Syntax:
> describe database 
> describe schema 
> Output:
> {noformat}
>  DESCRIBE SCHEMA xdf.proc;
> +-++
> |name | location   |
> +-++
> | xdf.proc| maprfs://dl.data/processed |
> +-++
> {noformat}
> Current implementation covers only dfs schema.
> For all other "" will be returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4514) Add describe database command

2016-03-20 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4514:

Description: 
Add describe database  command which will return directory associated 
with a database on the fly.

Syntax:
describe database 
describe schema 

Output:

{noformat}
 DESCRIBE SCHEMA xdf.proc;

+-++
|name | location   |
+-++
| xdf.proc| maprfs://dl.data/processed |
+-++
{noformat}

Current implementation covers only dfs schema.
For all other "" will be returned.



  was:
Add describe database  command which will return directory associated 
with a database on the fly.

Syntax:
describe database 
describe schema 

Output:

{noformat}
 DESCRIBE DATABASE xdf.proc;

+-++
|name | location |
+-++
| xdf.proc| maprfs://dl.data/processed |
+-++
{noformat}

Current implementation covers only dfs schema.
For all other "" will be returned.




> Add describe database  command
> ---
>
> Key: DRILL-4514
> URL: https://issues.apache.org/jira/browse/DRILL-4514
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: Future
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>
> Add describe database  command which will return directory 
> associated with a database on the fly.
> Syntax:
> describe database 
> describe schema 
> Output:
> {noformat}
>  DESCRIBE SCHEMA xdf.proc;
> +-++
> |name | location   |
> +-++
> | xdf.proc| maprfs://dl.data/processed |
> +-++
> {noformat}
> Current implementation covers only dfs schema.
> For all other "" will be returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4514) Add describe database command

2016-03-20 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4514:

Description: 
Add describe database  command which will return directory associated 
with a database on the fly.

Syntax:
describe database 
describe schema 

Output:

{noformat}
 DESCRIBE DATABASE xdf.proc;

+-++
|name | location |
+-++
| xdf.proc| maprfs://dl.data/processed |
+-++
{noformat}

Current implementation covers only dfs schema.
For all other "" will be returned.



  was:
Add describe database  command which will return directory associated 
with a database on the fly.

Syntax:
describe database 
describe schema 

Output:

{noformat}
 DESCRIBE DATABASE xdf.proc;

+-++
| SCHEMA_NAME | LOCATION   |
+-++
| xdf.proc| maprfs://dl.data/processed |
+-++
{noformat}




> Add describe database  command
> ---
>
> Key: DRILL-4514
> URL: https://issues.apache.org/jira/browse/DRILL-4514
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: Future
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>
> Add describe database  command which will return directory 
> associated with a database on the fly.
> Syntax:
> describe database 
> describe schema 
> Output:
> {noformat}
>  DESCRIBE DATABASE xdf.proc;
> +-++
> |name | location |
> +-++
> | xdf.proc| maprfs://dl.data/processed |
> +-++
> {noformat}
> Current implementation covers only dfs schema.
> For all other "" will be returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4519) File system directory-based partition pruning doesn't work correctly with parquet metadata

2016-03-20 Thread Miroslav Holubec (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miroslav Holubec updated DRILL-4519:

Description: 
We have parquet files in folders with following convention /MM/DD/HH.
Without drill's parquet metadata directory prunning works seamlessly.
{noformat}
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = ,  dir1 = MM, dir2 = DD, dir3 = HH
{noformat}
After creating metadata and executing same query, dir0 contains HH folder name 
instead yearly folder name. dir1...4 are null.
{noformat}
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = HH,  dir1 = null, dir2 = null, dir3 = null
{noformat}



  was:
We have parquet files in folders with following convention /MM/DD/HH.
Without drill's parquet metadata directory prunning works seamlessly.
{noformat}
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = ,  dir1 = MM, dir2 = DD, dir3 = HH
{noformat}
After creating metadata and querying root folder, dir0 contains HH folder name 
instead yearly folder name. dir1...4 are null.
{noformat}
select dir0, dir1, dir2 from hdfs.test.indexed;
dir0 = HH,  dir1 = null, dir2 = null, dir3 = null
{noformat}




> File system directory-based partition pruning doesn't work correctly with 
> parquet metadata
> --
>
> Key: DRILL-4519
> URL: https://issues.apache.org/jira/browse/DRILL-4519
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Miroslav Holubec
>
> We have parquet files in folders with following convention /MM/DD/HH.
> Without drill's parquet metadata directory prunning works seamlessly.
> {noformat}
> select dir0, dir1, dir2 from hdfs.test.indexed;
> dir0 = ,  dir1 = MM, dir2 = DD, dir3 = HH
> {noformat}
> After creating metadata and executing same query, dir0 contains HH folder 
> name instead yearly folder name. dir1...4 are null.
> {noformat}
> select dir0, dir1, dir2 from hdfs.test.indexed;
> dir0 = HH,  dir1 = null, dir2 = null, dir3 = null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4501) Complete MapOrListWriter for all supported data types

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197751#comment-15197751
 ] 

ASF GitHub Bot commented on DRILL-4501:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/427


> Complete MapOrListWriter for all supported data types
> -
>
> Key: DRILL-4501
> URL: https://issues.apache.org/jira/browse/DRILL-4501
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
> Fix For: 1.7.0
>
>
> This interface, at this time, does not include support for many data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4520) Error parsing JSON ( a column with different datatypes )

2016-03-20 Thread Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203235#comment-15203235
 ] 

Shankar commented on DRILL-4520:


I tried this one too but getting similar issues. 

> Error parsing JSON ( a column with different datatypes )
> 
>
> Key: DRILL-4520
> URL: https://issues.apache.org/jira/browse/DRILL-4520
> Project: Apache Drill
>  Issue Type: Test
>Reporter: Shankar
>
> I am stuck in the middle of somewhere. Could you please help me to resolve 
> below error.
> I am running query on drill 1.6.0 in cluster on logs json data (150GB size of 
> log file) ( 1 json / line).
> {quote}
> solution as per my opinion - 
> 1. Either drill should able to ignore those lines(ANY data type) while 
> reading or creating the table (CTAS).
> 2. Or Data will get stored as it is with ANY data type if any fields in data 
> differs in their data types. This will be useful in the case where other 
> columns (excluding ANY data type columns) carrying important informations.
> {quote}
> h4. -- test.json --
> Abount Data : 
> 1. I have just extract 3 lines from logs for test purpose.
> 2. In data field called "ajaxUrl" is differ in datatype. Sometimes it 
> contains string and sometime array of jsons and null as well. 
> 3. Here in our case - Some events in 150 gb json file are like this where 
> they differ in structure. I could say there are only 0.1% (per 150gb json 
> file) are such events.
> {noformat}
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus1","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032}
> {"gameId":"https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo":1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1","reg":true,"brkt":1457771400268,"sk":"2507001010530109","id":56312439,"a":0,"st":145777140,"e":"0.0","j":0,"n":"Loot
>  Qualifier 
> 1","tc":94,"et":0,"syst":1457771456,"rc":14577,"s":5,"t":1,"tk":false,"prnId":56311896,"jc":1,"tp":"10.0","ro":14540,"rp":0,"isprn":false},"fl":"192.168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":"D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873}
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus2","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032}
> {noformat}
> h4. -- Select Query  (ERROR) --
> {noformat}
> select
> `timestamp`,
> sessionid,
> gameid,
> ajaxUrl,
> ajaxData
> from dfs.`/tmp/test.json` t
> ;
> {noformat}
> {color:red}
> Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you are 
> using a ValueWriter of type NullableVarCharWriterImpl.
> File  /tmp/test.json
> Record  2
> Fragment 0:0
> {color}
> h4. -- Select Query (works Fine with UNION type) --
> Tried UNION type (an experimental feature)
> set `exec.enable_union_type` = true;
> {noformat}
> set `exec.enable_union_type` = true;
> +---+--+
> |  ok   | summary  |
> +---+--+
> | true  | exec.enable_union_type updated.  |
> +---+--+
> 1 row selected (0.193 seconds)
> select
> `timestamp`,
> sessionid,
> gameid,
> ajaxUrl,
> ajaxData
> from dfs.`/tmp/test.json` t
> ;
> +++--+---+---+
> |   timestamp|   sessionid|   
>  gameid|
> ajaxUrl| ajaxData  |
> +++--+---+---+
> | 1457658600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null  
>| 
> /player/updatebonus1  | null  |
> | 1457771458873  | D18104E8CA3071C7A8F4E141B127   | 
> https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043
>   | []| null  |
> | 1457958600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null  
>| 
> /player/updatebonus2  | null  |
> +++--+---+---+
> 3 rows selected (0.965 seconds)
> {noformat}
> h4. -- CTAS Query (ERROR) 

[jira] [Commented] (DRILL-4459) SchemaChangeException while querying hive json table

2016-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201360#comment-15201360
 ] 

ASF GitHub Bot commented on DRILL-4459:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/431#discussion_r56645171
  
--- Diff: 
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/fn/hive/TestInbuiltHiveUDFs.java
 ---
@@ -43,4 +47,17 @@ public void testEncode() throws Exception {
 .baselineValues(new Object[] { null })
 .go();
   }
+
+   @Test // DRILL-4459
+   public void testGetJsonObject() throws Exception {
+setColumnWidths(new int[]{260});
+String query = "select * from hive.simple_json where 
GET_JSON_OBJECT(simple_json.json, '$.DocId') = 'DocId2'";
+List results = testSqlWithResults(query);
+String expected = "json\n" + 
"{\"DocId\":\"DocId2\",\"User\":{\"Id\":122,\"Username\":\"larry122\",\"Name\":"
 +
--- End diff --

I've led this test to a common design. Thanks.
@Test // DRILL-4459
public void testGetJsonObject() throws Exception {
testBuilder()
.sqlQuery("select convert_from(json, 'json') as json from 
hive.simple_json " +
"where GET_JSON_OBJECT(simple_json.json, '$.employee_id') = 
'Emp2'")
.ordered()
.baselineColumns("json")

.baselineValues(mapOf("employee_id","2","full_name","Kamesh","first_name","Bh","last_name","Venkata","position","Store"))
.go();
}


> SchemaChangeException while querying hive json table
> 
>
> Key: DRILL-4459
> URL: https://issues.apache.org/jira/browse/DRILL-4459
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill, Functions - Hive
>Affects Versions: 1.4.0
> Environment: MapR-Drill 1.4.0
> Hive-1.2.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> getting the SchemaChangeException while querying json documents stored in 
> hive table.
> {noformat}
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> {noformat}
> minimum reproduce
> {noformat}
> created sample json documents using the attached script(randomdata.sh)
> hive>create table simplejson(json string);
> hive>load data local inpath '/tmp/simple.json' into table simplejson;
> now query it through Drill.
> Drill Version
> select * from sys.version;
> +---++-+-++
> | commit_id | commit_message | commit_time | build_email | build_time |
> +---++-+-++
> | eafe0a245a0d4c0234bfbead10c6b2d7c8ef413d | DRILL-3901:  Don't do early 
> expansion of directory in the non-metadata-cache case because it already 
> happens during ParquetGroupScan's metadata gathering operation. | 07.10.2015 
> @ 17:12:57 UTC | Unknown | 07.10.2015 @ 17:36:16 UTC |
> +---++-+-++
> 0: jdbc:drill:zk=> select * from hive.`default`.simplejson where 
> GET_JSON_OBJECT(simplejson.json, '$.DocId') = 'DocId2759947' limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 1:1
> [Error Id: 74f054a8-6f1d-4ddd-9064-3939fcc82647 on ip-10-0-0-233:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4330) Long running SQL query hangs once Foreman node is killed

2016-03-20 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197904#comment-15197904
 ] 

Khurram Faraaz commented on DRILL-4330:
---

Problem is reproducible on Drill 1.6.0, JDK 7 and git commit ID : 
64ab0a8ec9d98bf96f4d69274dddc180b8efe263

> Long running SQL query hangs once Foreman node is killed
> 
>
> Key: DRILL-4330
> URL: https://issues.apache.org/jira/browse/DRILL-4330
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.4.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sudheesh Katkam
> Attachments: drillbit.out
>
>
> Summary : Once Foreman node Drillbit is killed, long running query just hangs 
> and no profile information is written to Web UI. That long running query was 
> issued from the Foreman node.
> MapR Drill 1.4.0 GA
> MapR FS 5.0.0 GA
> JDK8
> 4 node CentOS cluster
> ./sqlline -u "jdbc:drill:schema=dfs.tmp -n mapr -p mapr"
> Issue a long running select query over JSON data
> Immediately kill the Drillbit on Foreman node (ps -eaf | grep Drillbit), kill 
> -9 PID
> The long running query hangs on sqlline prompt, there are no 
> messages/errors/Exceptions reported on sqlline prompt.
> On the Web UI there is no profile reported for the long running query that 
> was running on the Drillbit that was killed.
> Question (1) : Why was there no profile reported/written on the Web UI for 
> that long running query ? In a real production scenario user will not know 
> what query was under execution at the point when Foreman went down. 
> Question (2) : Why does the long running query not terminate, once the 
> foreman was killed ? from the drillbit.log snippet we do not see any 
> CANCELED/TERMINATED message for that query, why ?
> Snippet from drillbit.log on the foreman node. 
> {noformat}
> 2016-02-01 10:59:20,917 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 2950c576-b2d2-5bc3-e9b5-ff4414d088c0: select * from `twoKeyJsn.json`
> 2016-02-01 10:59:21,067 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.d.exec.store.dfs.FileSelection - FileSelection.create() took 1 ms
> 2016-02-01 10:59:21,068 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
> numFiles: 1
> 2016-02-01 10:59:21,068 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
> numFiles: 1
> 2016-02-01 10:59:21,069 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
> numFiles: 1
> 2016-02-01 10:59:21,069 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
> numFiles: 1
> 2016-02-01 10:59:21,069 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
> numFiles: 1
> 2016-02-01 10:59:21,069 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
> numFiles: 1
> 2016-02-01 10:59:21,069 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
> numFiles: 1
> 2016-02-01 10:59:21,155 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
> numFiles: 1
> 2016-02-01 10:59:21,250 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 
> using 1 threads. Time: 90ms total, 90.891938ms avg, 90ms max.
> 2016-02-01 10:59:21,250 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:foreman] INFO  
> o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 
> using 1 threads. Earliest start: 18.28 μs, Latest start: 18.28 μs, 
> Average start: 18.28 μs .
> 2016-02-01 10:59:21,448 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2950c576-b2d2-5bc3-e9b5-ff4414d088c0:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2016-02-01 10:59:21,448 [2950c576-b2d2-5bc3-e9b5-ff4414d088c0:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 2950c576-b2d2-5bc3-e9b5-ff4414d088c0:0:0: State to report: RUNNING
> {noformat}
> Doing kill -3 PID on the non foreman node for the Drillbit process gives us 
> stack trace in drillbit.out
> {noformat}
> 2016-02-01 11:03:31
> Full thread dump OpenJDK 64-Bit Server VM (25.65-b01 mixed mode):
> "qtp801808302-129" #129 prio=5 os_prio=0 

[jira] [Commented] (DRILL-4520) Error parsing JSON ( a column with different datatypes )

2016-03-20 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203212#comment-15203212
 ] 

Khurram Faraaz commented on DRILL-4520:
---

Can you please try to run your SELECT query by executing the below statement on 
sqlline prompt
this is to tell Drill to treat every value in the JSON file as a string.

{noformat}
 alter system set `store.json.all_text_mode`=true;
{noformat}

> Error parsing JSON ( a column with different datatypes )
> 
>
> Key: DRILL-4520
> URL: https://issues.apache.org/jira/browse/DRILL-4520
> Project: Apache Drill
>  Issue Type: Test
>Reporter: Shankar
>
> I am stuck in the middle of somewhere. Could you please help me to resolve 
> below error.
> I am running query on drill 1.6.0 in cluster on logs json data (150GB size of 
> log file) ( 1 json / line).
> {quote}
> solution as per my opinion - 
> 1. Either drill should able to ignore those lines(ANY data type) while 
> reading or creating the table (CTAS).
> 2. Or Data will get stored as it is with ANY data type if any fields in data 
> differs in their data types. This will be useful in the case where other 
> columns (excluding ANY data type columns) carrying important informations.
> {quote}
> h4. -- test.json --
> Abount Data : 
> 1. I have just extract 3 lines from logs for test purpose.
> 2. In data field called "ajaxUrl" is differ in datatype. Sometimes it 
> contains string and sometime array of jsons and null as well. 
> 3. Here in our case - Some events in 150 gb json file are like this where 
> they differ in structure. I could say there are only 0.1% (per 150gb json 
> file) are such events.
> {noformat}
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus1","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032}
> {"gameId":"https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo":1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1","reg":true,"brkt":1457771400268,"sk":"2507001010530109","id":56312439,"a":0,"st":145777140,"e":"0.0","j":0,"n":"Loot
>  Qualifier 
> 1","tc":94,"et":0,"syst":1457771456,"rc":14577,"s":5,"t":1,"tk":false,"prnId":56311896,"jc":1,"tp":"10.0","ro":14540,"rp":0,"isprn":false},"fl":"192.168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":"D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873}
> {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus2","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032}
> {noformat}
> h4. -- Select Query  (ERROR) --
> {noformat}
> select
> `timestamp`,
> sessionid,
> gameid,
> ajaxUrl,
> ajaxData
> from dfs.`/tmp/test.json` t
> ;
> {noformat}
> {color:red}
> Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you are 
> using a ValueWriter of type NullableVarCharWriterImpl.
> File  /tmp/test.json
> Record  2
> Fragment 0:0
> {color}
> h4. -- Select Query (works Fine with UNION type) --
> Tried UNION type (an experimental feature)
> set `exec.enable_union_type` = true;
> {noformat}
> set `exec.enable_union_type` = true;
> +---+--+
> |  ok   | summary  |
> +---+--+
> | true  | exec.enable_union_type updated.  |
> +---+--+
> 1 row selected (0.193 seconds)
> select
> `timestamp`,
> sessionid,
> gameid,
> ajaxUrl,
> ajaxData
> from dfs.`/tmp/test.json` t
> ;
> +++--+---+---+
> |   timestamp|   sessionid|   
>  gameid|
> ajaxUrl| ajaxData  |
> +++--+---+---+
> | 1457658600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null  
>| 
> /player/updatebonus1  | null  |
> | 1457771458873  | D18104E8CA3071C7A8F4E141B127   | 
> https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043
>   | []| null  |
> | 1457958600032  | BC497C7C39B3C90AC9E6E9E8194C3  | null  
>| 
> /player/updatebonus2  | null  |
>