[jira] [Created] (DRILL-2492) JDBC : There seems to be no way to execute a CTAS statement with JDBC successfully

2015-03-17 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-2492:


 Summary: JDBC : There seems to be no way to execute a CTAS 
statement with JDBC successfully
 Key: DRILL-2492
 URL: https://issues.apache.org/jira/browse/DRILL-2492
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Rahul Challapalli
Assignee: Daniel Barclay (Drill)


git.commit.id.abbrev=7b4c887

Query :
{code}
create table temp1 as select * from dfs.jdbctesting.`fewtypes.parquet`
{code}

I tried to execute the above query using Statement.executeQuery. This call 
returned a ResultSet object which has a single value false. And when I 
checked on HDFS there was no table created.

I tried using Statement.executeUpdate and got the below error:
{code}
Exception in thread main java.sql.SQLException: expected one result column
at 
net.hydromatic.avatica.AvaticaStatement.executeUpdate(AvaticaStatement.java:88)
at Dummy.testCTASQuery(Dummy.java:57)
at Dummy.main(Dummy.java:30)
{code}

Let me know if I am not using JDBC correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2470) Implement SMALLINT [umbrella/tracking bug]

2015-03-17 Thread Daniel Barclay (Drill) (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) updated DRILL-2470:
--
Summary: Implement SMALLINT [umbrella/tracking bug]  (was: Implement 
SMALLINT (umbrella/tracking bug).)

 Implement SMALLINT [umbrella/tracking bug]
 --

 Key: DRILL-2470
 URL: https://issues.apache.org/jira/browse/DRILL-2470
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2465) Fix multiple DatabaseMetaData.getColumns() bugs (some)

2015-03-17 Thread Daniel Barclay (Drill) (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) updated DRILL-2465:
--
Description: 
Fixed most {{getColumn()}} bugs reported in DRILL-2420:
- Added {{COLUMN_SIZE}} (in part to move later columns to right ordinal 
position).  [With workarounds for (possible) INFORMATION_SCHEMA.COLUMN bugs.]
- Fixed {{DECIMAL_DIGITS}} (was DECIMAL_PRECISION; didn't report values for 
numeric types other than DECIMAL).
- Fixed {{NUM_PREC_RADIX}} (was -1 for cases that should be NULL).
- Fixed {{REMARKS}} (from '' to NULL).
- Fixed {{COLUMN_DEF}} (from '' to NULL).
- Fixed {{CHARACTER_OCTET_LENGTH}} (was 4). [With workarounds for (possible) 
INFORMATION_SCHEMA.COLUMN bugs.]
- Fixed {{ORDINAL_POSITION}} (was returning 1 for every column).
- Fixed {{SCOPE_CATALOG}} (from '' to NULL).
- Fixed {{SCOPE_SCHEMA}} (from '' to NULL).
- Fixed {{SCOPE_TABLE}} (from '' to NULL).
- Fixed {{SOURCE_DATA_TYPE}} (from VARCHAR to INTEGER.)  [With workaround 
because SMALLINT not implemented yet.]

  was:
Fixed most {{getColumn()}} bugs reported in DRILL-2420:
- Added {{COLUMN_SIZE}} (in part to move later columns to right ordinal 
position).  [With workarounds for (possible) INFORMATION_SCHEMA.COLUMN bugs.]
- Fixed {{DECIMAL_DIGITS}} (was DECIMAL_PRECISION; didn't report values for 
numeric types other than DECIMAL).
- Fixed {{NUM_PREC_RADIX}} (was -1 for cases that should be NULL).
- Fixed {{REMARKS}} (from '' to NULL).
- Fixed {{COLUMN_DEF}} (from '' to NULL).
- Fixed {{CHARACTER_OCTET_LENGTH}} (was 4). [With workarounds for (possible) 
INFORMATION_SCHEMA.COLUMN bugs.]
- Fixed {{ORDINAL_POSITION}} (was returning 1 for every column).
- Fixed {{SCOPE_CATALOG}} (from '' to NULL).
- Fixed {{SCOPE_SCHEMA}} (from '' to NULL).
- Fixed {{SCOPE_TABLE}} (from '' to NULL).
- Fixed {{SOURCE_DATA_TYPE}} (from VARCHAR to INTEGER.)  [With workaround 
because SMALLINT not implemented yet.]

[Bug report in progress]


 Fix multiple DatabaseMetaData.getColumns() bugs (some) 
 ---

 Key: DRILL-2465
 URL: https://issues.apache.org/jira/browse/DRILL-2465
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC, Metadata
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)

 Fixed most {{getColumn()}} bugs reported in DRILL-2420:
 - Added {{COLUMN_SIZE}} (in part to move later columns to right ordinal 
 position).  [With workarounds for (possible) INFORMATION_SCHEMA.COLUMN bugs.]
 - Fixed {{DECIMAL_DIGITS}} (was DECIMAL_PRECISION; didn't report values for 
 numeric types other than DECIMAL).
 - Fixed {{NUM_PREC_RADIX}} (was -1 for cases that should be NULL).
 - Fixed {{REMARKS}} (from '' to NULL).
 - Fixed {{COLUMN_DEF}} (from '' to NULL).
 - Fixed {{CHARACTER_OCTET_LENGTH}} (was 4). [With workarounds for (possible) 
 INFORMATION_SCHEMA.COLUMN bugs.]
 - Fixed {{ORDINAL_POSITION}} (was returning 1 for every column).
 - Fixed {{SCOPE_CATALOG}} (from '' to NULL).
 - Fixed {{SCOPE_SCHEMA}} (from '' to NULL).
 - Fixed {{SCOPE_TABLE}} (from '' to NULL).
 - Fixed {{SOURCE_DATA_TYPE}} (from VARCHAR to INTEGER.)  [With workaround 
 because SMALLINT not implemented yet.]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2463) Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods

2015-03-17 Thread Daniel Barclay (Drill) (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) updated DRILL-2463:
--
Attachment: (was: DRILL-2463.2.patch.txt)

 Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods
 -

 Key: DRILL-2463
 URL: https://issues.apache.org/jira/browse/DRILL-2463
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)

 Fix AvaticaDrillSqlAccessor to implement mapping of SQL NULL to dummy 
 primitive values (e.g,., returning 0 for ResultSet.getInt(...)).
 Fix SqlAccessors template to implement mapping of SQL NULL to null pointers 
 (e.g., returning null from ResultSet.getString(...).)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2463) Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods

2015-03-17 Thread Daniel Barclay (Drill) (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) updated DRILL-2463:
--
Attachment: (was: DRILL-2463.1.patch.txt)

 Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods
 -

 Key: DRILL-2463
 URL: https://issues.apache.org/jira/browse/DRILL-2463
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)

 Fix AvaticaDrillSqlAccessor to implement mapping of SQL NULL to dummy 
 primitive values (e.g,., returning 0 for ResultSet.getInt(...)).
 Fix SqlAccessors template to implement mapping of SQL NULL to null pointers 
 (e.g., returning null from ResultSet.getString(...).)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-2158) Failure while attempting to start Drillbit in embedded mode.

2015-03-17 Thread kun22kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kun22kun closed DRILL-2158.
---
Resolution: Fixed

It's because of the openJDK. There should be JDK of oracle.

  Failure while attempting to start Drillbit in embedded mode. 
 --

 Key: DRILL-2158
 URL: https://issues.apache.org/jira/browse/DRILL-2158
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Affects Versions: 0.7.0
 Environment: Linux Master.hadoop 2.6.32-431.23.3.el6.x86_64 #1 SMP 
 Thu Jul 3117:20:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 CentOS release 6.5 (Final)
Reporter: kun22kun
Assignee: Chun Chang
Priority: Minor
  Labels: github-import, maven
 Fix For: 1.0.0


 First, I install my drill according to 
 “https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes”.
 When to start my drill via bin/sqlline -u jdbc:drill:zk=local -n admin -p 
 admin,
 It shows 
 Error: Failure while attempting to start Drillbit in embedded mode. 
 (state=,code=0)
 sqlline version 1.1.6
 0: jdbc:drill:zk=local
 Then I install my drill with maven according to INSTALL.md in the source 
 from github. But the same result like above.
 Finally , in the path tmp/drill/, there's nothing, do I need to create by 
 myself?
 Is it necessary to build a distributed system for example hadoop?
 Much apperaite!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2480) Identify, fix INFORMATION_SCHEMA and JDBC metadata bugs [umbrella/tracking bug]

2015-03-17 Thread Daniel Barclay (Drill) (JIRA)
Daniel Barclay (Drill) created DRILL-2480:
-

 Summary: Identify, fix INFORMATION_SCHEMA and JDBC metadata bugs 
[umbrella/tracking bug]
 Key: DRILL-2480
 URL: https://issues.apache.org/jira/browse/DRILL-2480
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2420) Identify, fix DatabaseMetaData.getColumns() bugs [umbrella/tracking bug]

2015-03-17 Thread Daniel Barclay (Drill) (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) updated DRILL-2420:
--
Summary: Identify, fix DatabaseMetaData.getColumns() bugs 
[umbrella/tracking bug]  (was: Identify and fix DatabaseMetaData.getColumns() 
bugs [umbrella/tracking bug])

 Identify, fix DatabaseMetaData.getColumns() bugs [umbrella/tracking bug]
 

 Key: DRILL-2420
 URL: https://issues.apache.org/jira/browse/DRILL-2420
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC, Metadata
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)

 Drill's implementation of {{DatabaseMetaData.getColumns(...)}} (currently at 
 {{org.apache.drill.jdbc.MetaImpl.getColumns()}}) doesn't match the JDBC 
 specification (the Javadoc documentation for 
 {{DatabaseMetaData.getColumns(...)}} (as of Java 7)).  In the returned 
 {{ResultSet}}:
 1. Column {{DATA_TYPE}} is of type {{VARCHAR}} (containing the type name) 
 rather than being of type {{INTEGER}} (containing values per 
 {{java.sql.Types.*}}).
 2. Column {{TYPE_NAME}} is missing.
 3. Column {{COLUMN_SIZE}} is missing.
 4. (Columns after {{DATA_TYPE}} are at incorrect indexes.)
 5. Column {{DECIMAL_DIGITS}} is misnamed {{DECIMAL_PRECISION}}.
 6. Column {{REMARKS}} is an empty string, but probably should be {{NULL}}.
 7. Column {{COLUMN_DEF}} is an empty string, but probably should be {{NULL}}.
 8. Column {{CHAR_OCTET_LENGTH}} is always {{4}}, but should be the maximum 
 number of bytes in the _column_ for character types .
 8.5  Column {{IS_NULLABLE}} seems to always return 'NO'.
 9. Column {{ORDINAL_POSITION}} is always {{1}}, but should be the index of 
 the specific column.
 10. Column {{IS_NULLABLE}} is {{'YES'}}, which doesn't seem to correspond to 
 the value for {{NULLABLE}} ({{DatabaseMetaData.columnNullableUnknown}}).
 11. Column {{SCOPE_CATALOG}} is an empty string, but should be {{NULL}}.
 12. Column {{SCOPE_SCHEMA}} is an empty string, but should be {{NULL}}.
 13. Column {{SCOPE_TABLE}} is an empty string, but should be {{NULL}}.
 14. Column {{SOURCE_DATA_TYPE}} is an empty string, but should be {{NULL}}.
 Additional bugs or suspect behavior:
 - {{DECIMAL_DIGITS}}/{{DECIMAL_PRECISION}} is {{-1}} when it should be 
 {{NULL}} (when not applicable).
 - {{NUM_PREC_RADIX}} is {{-1}} when it probably should be {{NULL}} (when not 
 applicable).
 (Other columns to check:
 Re {{BUFFER_LENGTH}}, {{SQL_DATA_TYPE}}, and {{SQL_DATETIME_SUB}}:  When JDBC 
 says a column is not used, are there any requirements on the values (e.g., 
 being {{NULL}})? 
 Re {{IS_AUTOINCREMENT}}:  Do we know that a column is not auto-incremented?  
 If so, the value could be {{'NO'}} rather than an empty string.
 Re {{IS_GENERATEDCOLUMN}}:  Do we know that a column is not generated?  If 
 so, the value could be {{'NO'}} rather than an empty string.
 Re {{NULLABLE}} (:  Do know whether a column is nullable or not?  If so, we 
 could return the specific answer rather that just saying that it's unknown.
 )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2459) INFO._SCHEMA's CHARACTER_MAXIMUM_LENGTH is -1 for type CHAR

2015-03-17 Thread Daniel Barclay (Drill) (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) updated DRILL-2459:
--
Description: 
INFORMATION_SCHEMA.COLUMNS.CHARACTER_MAXIMUM_LENGTH does not report the length 
for type CHAR.  For example, for type descriptor CHAR(4), it doesn't return 
4.  Instead, it returns -1:

0: jdbc:drill:zk=local USE dfs.tmp;
+++
| ok |  summary   |
+++
| true   | Default schema changed to 'dfs.tmp' |
+++
1 row selected (0.05 seconds)
0: jdbc:drill:zk=local CREATE OR REPLACE VIEW TempView AS SELECT CAST( NULL AS 
VARCHAR(3) ), CAST( NULL AS CHAR(4) )  FROM INFORMATION_SCHEMA.CATALOGS LIMIT 1 
;
+++
| ok |  summary   |
+++
| true   | View 'TempView' replaced successfully in 'dfs.tmp' schema |
+++
1 row selected (0.05 seconds)
0: jdbc:drill:zk=local SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE 
TABLE_NAME = 'TempView';
+---+--++-+--+-++--+-+---+---+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | ORDINAL_POSITION | 
IS_NULLABLE | DATA_TYPE  | CHARACTER_MAXIMUM_LENGTH | NUMERIC_PRECISION_RADIX | 
NUMERIC_SCALE | NUMERIC_PRECISION |
+---+--++-+--+-++--+-+---+---+
| DRILL | dfs.tmp  | TempView   | EXPR$0  | 0| 
NO  | VARCHAR| 3| -1  | 
-1| -1|
| DRILL | dfs.tmp  | TempView   | EXPR$1  | 1| 
NO  | CHAR   | -1   | -1  | 
-1| 4 |
+---+--++-+--+-++--+-+---+---+
2 rows selected (0.072 seconds)
0: jdbc:drill:zk=local 


Hmm.  Note the 4 in the NUMERIC_PRECISION column:

0: jdbc:drill:zk=local SELECT DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, 
NUMERIC_PRECISION FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'TempView';
++--+---+
| DATA_TYPE  | CHARACTER_MAXIMUM_LENGTH | NUMERIC_PRECISION |
++--+---+
| VARCHAR| 3| -1|
| CHAR   | -1   | 4 |
++--+---+
2 rows selected (0.065 seconds)
0: jdbc:drill:zk=local 


  was:
INFORMATION_SCHEMA.COLUMNS.CHARACTER_MAXIMUM_LENGTH does not report the length 
for type CHAR.  For example, for type descriptor CHAR(4), it doesn't return 
4.  Instead, it returns 1:

0: jdbc:drill:zk=local USE dfs.tmp;
+++
| ok |  summary   |
+++
| true   | Default schema changed to 'dfs.tmp' |
+++
1 row selected (0.05 seconds)
0: jdbc:drill:zk=local CREATE OR REPLACE VIEW TempView AS SELECT CAST( NULL AS 
VARCHAR(3) ), CAST( NULL AS CHAR(4) )  FROM INFORMATION_SCHEMA.CATALOGS LIMIT 1 
;
+++
| ok |  summary   |
+++
| true   | View 'TempView' replaced successfully in 'dfs.tmp' schema |
+++
1 row selected (0.05 seconds)
0: jdbc:drill:zk=local SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE 
TABLE_NAME = 'TempView';
+---+--++-+--+-++--+-+---+---+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | ORDINAL_POSITION | 
IS_NULLABLE | DATA_TYPE  | CHARACTER_MAXIMUM_LENGTH | NUMERIC_PRECISION_RADIX | 
NUMERIC_SCALE | NUMERIC_PRECISION |
+---+--++-+--+-++--+-+---+---+
| DRILL | dfs.tmp  | TempView   | EXPR$0  | 0| 
NO  | VARCHAR| 3| -1  | 
-1| -1|
| DRILL | dfs.tmp  | TempView   | EXPR$1  | 1| 
NO  | CHAR   | -1   | -1  | 
-1| 4 |

[jira] [Updated] (DRILL-2180) Star is not expanded when being used with flatten

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu updated DRILL-2180:
-
Attachment: DRILL-2180.1.patch

Patch Available

 Star is not expanded when being used with flatten
 -

 Key: DRILL-2180
 URL: https://issues.apache.org/jira/browse/DRILL-2180
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.9.0

 Attachments: DRILL-2180.1.patch


 For example,
 select *, flatten(j.topping) tt  +
   from dfs_test.`%s` j 
 (using the same data set in DRILL-2012)
 * tt
 null  {id:5001,type:None}
 null  {id:5002,type:Glazed}
 null  {id:5005,type:Sugar}
 null  {id:5007,type:Powdered Sugar}
 null  {id:5006,type:Chocolate with Sprinkles}
 null  {id:5003,type:Chocolate}
 null  {id:5004,type:Maple}
 Note that the first column is messed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1911) Querying same field multiple times with different case would hit memory leak and return incorrect result.

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365986#comment-14365986
 ] 

Sean Hsuan-Yi Chu commented on DRILL-1911:
--

Resolved in  Commit#: ae2053d2a078a40033a140f2dfaeef802a5e8254

 Querying same field multiple times with different case would hit memory leak 
 and return incorrect result. 
 --

 Key: DRILL-1911
 URL: https://issues.apache.org/jira/browse/DRILL-1911
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Jinfeng Ni
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.8.0


 git.commit.id.abbrev=309e1be
 If query the same field twice, with different case, Drill will throw memory 
 assertion error. 
  select employee_id, Employee_id from cp.`employee.json` limit 2;
 +-+
 | employee_id |
 +-+
 | 1   |
 | 2   |
 Query failed: Query failed: Failure while running fragment., Attempted to 
 close accountor with 2 buffer(s) still allocatedfor QueryId: 
 2b5cc8eb-2817-aadb-e0fa-49272796592a, MajorFragmentId: 0, MinorFragmentId: 0.
  Total 1 allocation(s) of byte size(s): 4096, at stack location:
   
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:212)
   
 org.apache.drill.exec.vector.UInt1Vector.allocateNewSafe(UInt1Vector.java:137)
   
 org.apache.drill.exec.vector.NullableBigIntVector.allocateNewSafe(NullableBigIntVector.java:173)
   
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doAlloc(ProjectRecordBatch.java:229)
   
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:167)
   
 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
   
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132)
   
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
   
 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
   
 org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67)
   
 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:97)
   
 org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57)
   
 org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:114)
   
 org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
   
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:744)
 Also, notice that the query result only contains one field; the second field 
 is missing. 
 The plan looks fine.
 Drill Physical : 
 00-00Screen: rowcount = 463.0, cumulative cost = {1900.3 rows, 996.3 cpu, 
 0.0 io, 0.0 network, 0.0 memory}, id = 103
 00-01  Project(employee_id=[$0], Employee_id=[$1]): rowcount = 463.0, 
 cumulative cost = {1854.0 rows, 950.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
 id = 102
 00-02SelectionVectorRemover: rowcount = 463.0, cumulative cost = 
 {1391.0 rows, 942.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 101
 00-03  Limit(fetch=[2]): rowcount = 463.0, cumulative cost = {928.0 
 rows, 479.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 100
 00-04Project(employee_id=[$0], Employee_id=[$0]): rowcount = 
 463.0, cumulative cost = {926.0 rows, 471.0 cpu, 0.0 io, 0.0 network, 0.0 
 memory}, id = 99
 00-05  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/employee.json, numFiles=1, columns=[`employee_id`], 
 files=[/employee.json]]]): rowcount = 463.0, cumulative cost = {463.0 rows, 
 463.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 98



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-1842) SELECT COUNT DISTINCT with HAVING fails to plan the query

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu updated DRILL-1842:
-
Assignee: Aman Sinha  (was: Sean Hsuan-Yi Chu)

 SELECT COUNT DISTINCT with HAVING fails to plan the query
 -

 Key: DRILL-1842
 URL: https://issues.apache.org/jira/browse/DRILL-1842
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.6.0
Reporter: Chris Matta
Assignee: Aman Sinha
 Fix For: 0.9.0

 Attachments: ip-172-16-1-175_drillbit.log


 Tableau is using the following query to get the distinct count of a measure:
 {code:SQL}
 SELECT COUNT(DISTINCT `custview`.`age`) AS `ctd_age_ok` FROM 
 `mfs.views`.`nestedclickview` `nestedclickview` INNER JOIN 
 `mfs.views`.`custview` `custview` ON (`nestedclickview`.`cust_id` = 
 `custview`.`cust_id`) HAVING (COUNT(1)  0);
 {code}
 And it fails on 0.06r2 with a planing error.
 Interestingly if I remove the HAVING(COUNT(1)0) statement at the end it 
 works:
 {code}
 : jdbc:drill:zk=172.16.1.175:5181,172.16.1.1 SELECT COUNT(DISTINCT 
 `custview`.`age`) AS `ctd_age_ok` FROM `mfs.views`.`nestedclickview` 
 `nestedclickview` INNER JOIN `mfs.views`.`custview` `custview` ON 
 (`nestedclickview`.`cust_id` = `custview`.`cust_id`);
 ++ 
 | ctd_age_ok | 
 ++ 
 | 5  | 
 ++ 
 1 row selected (4.776 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1842) SELECT COUNT DISTINCT with HAVING fails to plan the query

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366155#comment-14366155
 ] 

Sean Hsuan-Yi Chu commented on DRILL-1842:
--

[~amansinha100], it might be related your current work. So I assign to you.

 SELECT COUNT DISTINCT with HAVING fails to plan the query
 -

 Key: DRILL-1842
 URL: https://issues.apache.org/jira/browse/DRILL-1842
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.6.0
Reporter: Chris Matta
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.9.0

 Attachments: ip-172-16-1-175_drillbit.log


 Tableau is using the following query to get the distinct count of a measure:
 {code:SQL}
 SELECT COUNT(DISTINCT `custview`.`age`) AS `ctd_age_ok` FROM 
 `mfs.views`.`nestedclickview` `nestedclickview` INNER JOIN 
 `mfs.views`.`custview` `custview` ON (`nestedclickview`.`cust_id` = 
 `custview`.`cust_id`) HAVING (COUNT(1)  0);
 {code}
 And it fails on 0.06r2 with a planing error.
 Interestingly if I remove the HAVING(COUNT(1)0) statement at the end it 
 works:
 {code}
 : jdbc:drill:zk=172.16.1.175:5181,172.16.1.1 SELECT COUNT(DISTINCT 
 `custview`.`age`) AS `ctd_age_ok` FROM `mfs.views`.`nestedclickview` 
 `nestedclickview` INNER JOIN `mfs.views`.`custview` `custview` ON 
 (`nestedclickview`.`cust_id` = `custview`.`cust_id`);
 ++ 
 | ctd_age_ok | 
 ++ 
 | 5  | 
 ++ 
 1 row selected (4.776 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2414) Union-All on SELECT * FROM schema-less data source will throw exception

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-2414.
--
Resolution: Fixed

 Union-All on SELECT * FROM schema-less data source will throw exception
 ---

 Key: DRILL-2414
 URL: https://issues.apache.org/jira/browse/DRILL-2414
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Aman Sinha
 Fix For: 0.8.0

 Attachments: DRILL-2414.1.patch


 Union-All on SELECT * (wildcard symbol) is supported only for the cases where 
 schema (i.e., hive, view) is available. 
 For detailed design documentation, please refer to DRILL-2207.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException

2015-03-17 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365788#comment-14365788
 ] 

Aman Sinha commented on DRILL-2311:
---

+1  on the patch.  Committed to master branch: 
ae2053d2a078a40033a140f2dfaeef802a5e8254

 Create table with same columns of different case results in a 
 java.lang.IllegalStateException
 -

 Key: DRILL-2311
 URL: https://issues.apache.org/jira/browse/DRILL-2311
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 0.8.0
Reporter: Ramana Inukonda Nagaraj
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.9.0

 Attachments: DRILL-2311.1.patch


 Doing a create table with same column in different case results in a runtime 
 exception. This query should fail at planning or parsing.
 CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as 
 bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM 
 dfs.`/user/root/alltypes.json`;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2486) Return format differences between drill odbc from interval date queries

2015-03-17 Thread Krystal (JIRA)
Krystal created DRILL-2486:
--

 Summary: Return format differences between drill  odbc from 
interval date queries  
 Key: DRILL-2486
 URL: https://issues.apache.org/jira/browse/DRILL-2486
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.8.0
Reporter: Krystal
Assignee: Daniel Barclay (Drill)
Priority: Minor


git.commit.id=ae2053d2a078a40033a140f2dfaeef802a5e8254

The format of results from interval date queries is different between drill and 
odbc.  Below are some examples.

From drill:
SELECT interval '10' day from basic limit 1;
++
|   EXPR$0   |
++
| P10D   |
++

SELECT interval '12-11' year to month from basic limit 1;
++
|   EXPR$0   |
++
| P12Y11M|

SELECT interval '1' year from basic limit 1;
++
|   EXPR$0   |
++
| P1Y|
++

SELECT interval '9' month from basic limit 1;
++
|   EXPR$0   |
++
| P9M|
++

 From ODBC:
SQL SELECT interval '10' day from basic limit 1
+---+
| EXPR$0|
+---+
| 10 00:00:00.00|
+---+

SQL SELECT interval '12-11' year to month from basic limit 1
+--+
| EXPR$0   |
+--+
| 12-11|
+--+

SQL SELECT interval '1' year from basic limit 1
+--+
| EXPR$0   |
+--+
| 1-00 |
+--+

SQL SELECT interval '9' month from basic limit 1
+--+
| EXPR$0   |
+--+
| 0-09 |
+--+

We should have consistent output from the 2 sources.  The result from ODBC 
seems easier to read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2478) Validating values assigned to SYSTEM/SESSION configuration parameters

2015-03-17 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366065#comment-14366065
 ] 

Khurram Faraaz commented on DRILL-2478:
---

system config option store.parquet.block-size accepts different values, inputs 
must be validated.

{code}
0: jdbc:drill: alter system set `store.parquet.block-size`=0;
+++
| ok |  summary   |
+++
| true   | store.parquet.block-size updated. |
+++
1 row selected (0.076 seconds)
0: jdbc:drill: alter system set `store.parquet.block-size`=-1;
+++
| ok |  summary   |
+++
| true   | store.parquet.block-size updated. |
+++
1 row selected (0.05 seconds)
0: jdbc:drill: alter system set `store.parquet.block-size`=536870912;
+++
| ok |  summary   |
+++
| true   | store.parquet.block-size updated. |
+++
1 row selected (0.057 seconds)
0: jdbc:drill: alter system set `store.parquet.block-size`=100;
+++
| ok |  summary   |
+++
| true   | store.parquet.block-size updated. |
+++
1 row selected (0.078 seconds)

{code}

 Validating values assigned to SYSTEM/SESSION configuration parameters
 -

 Key: DRILL-2478
 URL: https://issues.apache.org/jira/browse/DRILL-2478
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.8.0
 Environment: {code}
 0: jdbc:drill: select * from sys.version;
 +++-+-++
 | commit_id  | commit_message | commit_time | build_email | build_time |
 +++-+-++
 | f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe | DRILL-2209 Insert 
 ProjectOperator with MuxExchange | 09.03.2015 @ 01:49:18 EDT | Unknown | 
 09.03.2015 @ 04:50:05 EDT |
 +++-+-++
 1 row selected (0.046 seconds)
 {code}
Reporter: Khurram Faraaz
Assignee: Daniel Barclay (Drill)

 Values that are assigned to configuration parameters of type SYSTEM and 
 SESSION must be validated. Currently any value can be assigned to some of the 
 SYSTEM/SESSION type parameters.
 Here are two examples where assignment of invalid values to store.format does 
 not result in any error.
 {code}
 0: jdbc:drill: alter session set `store.format`='1';
 +++
 | ok |  summary   |
 +++
 | true   | store.format updated. |
 +++
 1 row selected (0.02 seconds)
 {code}
 {code}
 0: jdbc:drill: alter session set `store.format`='foo';
 +++
 | ok |  summary   |
 +++
 | true   | store.format updated. |
 +++
 1 row selected (0.039 seconds)
 {code}
 In some cases values to some of the configuration parameters are validated, 
 like in this example, where trying to assign an invalid value to parameter 
 store.parquet.compression results in an error, which is correct. However, 
 this kind of validation is not performed for every configuration parameter of 
 SYSTEM/SESSION type. These values that are assigned to parameters must be 
 validated, and report errors if incorrect values are assigned by users.
 {code}
 0: jdbc:drill: alter session set `store.parquet.compression`='anything';
 Query failed: ExpressionParsingException: Option store.parquet.compression 
 must be one of: [snappy, gzip, none]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2180) Star is not expanded when being used with flatten

2015-03-17 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366089#comment-14366089
 ] 

Mehant Baid commented on DRILL-2180:


+1

 Star is not expanded when being used with flatten
 -

 Key: DRILL-2180
 URL: https://issues.apache.org/jira/browse/DRILL-2180
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Mehant Baid
Priority: Critical
 Fix For: 0.9.0

 Attachments: DRILL-2180.1.patch


 For example,
 select *, flatten(j.topping) tt  +
   from dfs_test.`%s` j 
 (using the same data set in DRILL-2012)
 * tt
 null  {id:5001,type:None}
 null  {id:5002,type:Glazed}
 null  {id:5005,type:Sugar}
 null  {id:5007,type:Powdered Sugar}
 null  {id:5006,type:Chocolate with Sprinkles}
 null  {id:5003,type:Chocolate}
 null  {id:5004,type:Maple}
 Note that the first column is messed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2481) Querying individual column from view results in AssertionError

2015-03-17 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-2481:
--
Description: 
Querying an individual column from a view results in an AssertionError
Data used was from a csv file, its content was a single row (pls see below)

1,John Doe,HR,5000,Software Engineer 

{code }

0: jdbc:drill: use dfs.tmp;
+++
| ok |  summary   |
+++
| true   | Default schema changed to 'dfs.tmp' |
+++
1 row selected (0.188 seconds)

0: jdbc:drill: create view v1 as select * from `employee.csv` union all select 
* from `employee.csv`;
+++
| ok |  summary   |
+++
| true   | View 'v1' created successfully in 'dfs.tmp' schema |
+++
1 row selected (0.073 seconds)
0: jdbc:drill: create view v2 as select * from `employee.csv` union all select 
* from `employee.csv`;
+++
| ok |  summary   |
+++
| true   | View 'v2' created successfully in 'dfs.tmp' schema |
+++
1 row selected (0.046 seconds)
0: jdbc:drill: select * from v1;
++
|  columns   |
++
| [1,John Doe,HR,5000,Software Engineer] |
| [1,John Doe,HR,5000,Software Engineer] |
++
2 rows selected (0.087 seconds)
0: jdbc:drill: select * from v2;
++
|  columns   |
++
| [1,John Doe,HR,5000,Software Engineer] |
| [1,John Doe,HR,5000,Software Engineer] |
++
2 rows selected (0.075 seconds)


0: jdbc:drill: describe v1;
+-++-+
| COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
+-++-+
| *   | ANY| NO  |
+-++-+
1 row selected (0.084 seconds)
0: jdbc:drill: describe v2;
+-++-+
| COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
+-++-+
| *   | ANY| NO  |
+-++-+
1 row selected (0.083 seconds)

0: jdbc:drill: select columns[0] from v1;
Query failed: AssertionError: ANY

Error: exception while executing query: Failure while executing query. 
(state=,code=0)

{code}

{code}
Stack trace from drillbit.log

2015-03-17 16:40:43,176 [2af7a6f3-bbcf-3a34-dfae-5b5bb11ff4a9:foreman] INFO  
o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 using 
1 threads. Time: 1ms total, 1.060851ms avg, 1ms max.
2015-03-17 16:40:43,178 [2af7a6f3-bbcf-3a34-dfae-5b5bb11ff4a9:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - State change requested.  PENDING -- 
FAILED
org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
during fragment initialization: ANY
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:195) 
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_75]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_75]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
Caused by: java.lang.AssertionError: ANY
at 
org.eigenbase.reltype.RelDataTypeImpl.getFieldCount(RelDataTypeImpl.java:114) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.relopt.RelOptUtil$2.size(RelOptUtil.java:143) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitInputRef(RexChecker.java:111) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitInputRef(RexChecker.java:55) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexInputRef.accept(RexInputRef.java:103) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:136) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:55) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexCall.accept(RexCall.java:106) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:136) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:55) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexCall.accept(RexCall.java:106) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rel.ProjectRelBase.isValid(ProjectRelBase.java:156) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rel.ProjectRelBase.init(ProjectRelBase.java:82) 
~[optiq-core-0.9-drill-r20.jar:na]
at 

[jira] [Resolved] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-2311.
--
Resolution: Fixed

 Create table with same columns of different case results in a 
 java.lang.IllegalStateException
 -

 Key: DRILL-2311
 URL: https://issues.apache.org/jira/browse/DRILL-2311
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 0.8.0
Reporter: Ramana Inukonda Nagaraj
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.8.0

 Attachments: DRILL-2311.1.patch


 Doing a create table with same column in different case results in a runtime 
 exception. This query should fail at planning or parsing.
 CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as 
 bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM 
 dfs.`/user/root/alltypes.json`;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2414) Union-All on SELECT * FROM schema-less data source will throw exception

2015-03-17 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365778#comment-14365778
 ] 

Aman Sinha commented on DRILL-2414:
---

+1.   Committed to master branch, commit #:   
63bd48eb0a8081e3c24a7e49095bbcfc0f36bf7c

 Union-All on SELECT * FROM schema-less data source will throw exception
 ---

 Key: DRILL-2414
 URL: https://issues.apache.org/jira/browse/DRILL-2414
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Aman Sinha
 Fix For: 0.9.0

 Attachments: DRILL-2414.1.patch


 Union-All on SELECT * (wildcard symbol) is supported only for the cases where 
 schema (i.e., hive, view) is available. 
 For detailed design documentation, please refer to DRILL-2207.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365816#comment-14365816
 ] 

Sean Hsuan-Yi Chu commented on DRILL-2311:
--

Review Board:
https://reviews.apache.org/r/32089/

 Create table with same columns of different case results in a 
 java.lang.IllegalStateException
 -

 Key: DRILL-2311
 URL: https://issues.apache.org/jira/browse/DRILL-2311
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 0.8.0
Reporter: Ramana Inukonda Nagaraj
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.8.0

 Attachments: DRILL-2311.1.patch


 Doing a create table with same column in different case results in a runtime 
 exception. This query should fail at planning or parsing.
 CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as 
 bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM 
 dfs.`/user/root/alltypes.json`;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365761#comment-14365761
 ] 

Sean Hsuan-Yi Chu commented on DRILL-2380:
--

The failure was due to Union-All. After new union-all had gotten checked in, 
this query ran and gave the same result as postrgres.

 TPC-DS Query 33 and simplified variants return wrong results
 

 Key: DRILL-2380
 URL: https://issues.apache.org/jira/browse/DRILL-2380
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.8.0


 TPC-DS query 33 returns wrong results. 
 {code:sql}
 WITH ss 
  AS (SELECT i_manufact_id, 
 Sum(ss_ext_sales_price) total_sales 
  FROM   store_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ss_item_sk = i_item_sk 
 AND ss_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ss_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  cs 
  AS (SELECT i_manufact_id, 
 Sum(cs_ext_sales_price) total_sales 
  FROM   catalog_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND cs_item_sk = i_item_sk 
 AND cs_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND cs_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  ws 
  AS (SELECT i_manufact_id, 
 Sum(ws_ext_sales_price) total_sales 
  FROM   web_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ws_item_sk = i_item_sk 
 AND ws_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ws_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id) 
 SELECT i_manufact_id, 
Sum(total_sales) total_sales 
 FROM   (SELECT i_manufact_id, total_sales 
 FROM   ss 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   cs 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   ws) tmp1 
 GROUP  BY i_manufact_id 
 ORDER  BY total_sales
 LIMIT 10;
 Drill Results:
 +---+-+
 | i_manufact_id | total_sales |
 +---+-+
 | 440   | 0.12|
 | 434   | 13.16   |
 | 415   | 14.04   |
 | 449   | 15.63   |
 | 563   | 31.46   |
 | 357   | 49.50   |
 | 624   | 67.94   |
 | 192   | 74.40   |
 | 137   | 83.42   |
 | 240   | 85.26   |
 +---+-+
 10 rows selected (7.57 seconds)
 Postgres Results:
  i_manufact_id | total_sales 
 ---+-
930 |1.18
818 |   41.86
913 |  141.90
784 |  184.90
488 |  275.08
993 |  301.60
700 |  340.52
895 |  802.30
766 |  839.76
858 |  859.18
 (10 rows)
 {code}
 The following simplified variants also return wrong results:
 {code:sql}
 SELECT sum(x)
 FROM
 (SELECT ss_ext_sales_price x, ss_item_sk
 FROM  store_sales
  GROUP BY ss_item_sk, ss_ext_sales_price
 UNION ALL
 SELECT cs_ext_sales_price x, cs_item_sk
 FROM catalog_sales
 GROUP BY cs_item_sk, cs_ext_sales_price) tmp
 GROUP BY x
 LIMIT 10;
 Drill Results:
 ++
 |   EXPR$0   |
 ++
 | 14141.40   |
 | 28060.00   |
 | 30912.70   |
 | 43706.88   |
 | 38267.64   |
 | 10173.00   |
 | 37829.25   |
 | 5349.50|
 | 107515.80  |
 | 4440.84|
 ++
 10 rows selected (14.435 seconds)
 Postgres Results:
sum
 --   
  45234.00
   5735.31
   2275.60
   6921.32
   2590.46
   

[jira] [Updated] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu updated DRILL-2311:
-
Attachment: DRILL-2311.1.patch

 Create table with same columns of different case results in a 
 java.lang.IllegalStateException
 -

 Key: DRILL-2311
 URL: https://issues.apache.org/jira/browse/DRILL-2311
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 0.8.0
Reporter: Ramana Inukonda Nagaraj
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.9.0

 Attachments: DRILL-2311.1.patch


 Doing a create table with same column in different case results in a runtime 
 exception. This query should fail at planning or parsing.
 CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as 
 bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM 
 dfs.`/user/root/alltypes.json`;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2441) Throw unsupported error message in case of inequality join

2015-03-17 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365784#comment-14365784
 ] 

Aman Sinha commented on DRILL-2441:
---

+1.  Committed to master branch commit #: ae2053d2a

 Throw unsupported error message in case of inequality join
 --

 Key: DRILL-2441
 URL: https://issues.apache.org/jira/browse/DRILL-2441
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Victoria Markman
Assignee: Aman Sinha
 Fix For: 0.9.0

 Attachments: DRILL-2441.1.patch


 Since we don't support inequality join, the whole class of queries will throw 
 huge page long Can't plan exception
 This is a request to throw a nice error message that we throw in case of 
 cartesian join in these cases as well.
 {code} 
 select * from t1 left outer join t2  on (t1.a1 = t2.a2 and t1.b2  t2.b2);
 select * from t1 right outer join t2 on (t1.a1 = t2.a2 and t1.b2  t2.b2);
 {code}
 Example of an exception:
 {code}
 0: jdbc:drill:schema=dfs select * from t1 inner join t2 on(t1.b1  t2.b2);
 Query failed: UnsupportedRelOperatorException: This query cannot be planned 
 possibly due to either a cartesian join or an inequality join
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2176) IndexOutOfBoundsException for count(*) on a subquery which does order-by

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365783#comment-14365783
 ] 

Sean Hsuan-Yi Chu commented on DRILL-2176:
--

I cannot reproduce this bug. Was it resolved??

 IndexOutOfBoundsException for count(*) on a subquery which does order-by
 

 Key: DRILL-2176
 URL: https://issues.apache.org/jira/browse/DRILL-2176
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 0.7.0
Reporter: Aman Sinha
Assignee: Aman Sinha
 Fix For: 0.9.0


 The IOBE occurs in creating the collation trait in Calcite.  
 {code}
 0: jdbc:drill:zk=local select count(*) from (select n_nationkey, n_regionkey 
 from cp.`tpch/nation.parquet` order by 1, 2);
 Query failed: IndexOutOfBoundsException: index (1) must be less than size (1)
 {code}
 Full stack trace: 
 {code}
 aused by: java.lang.IndexOutOfBoundsException: index (1) must be less than 
 size (1)
 at 
 com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
  ~[guava-14.0.1.jar:na]
 at 
 com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
  ~[guava-14.0.1.jar:na]
 at 
 com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:45)
  ~[guava-14.0.1.jar:na]
 at org.eigenbase.rex.RexBuilder.makeInputRef(RexBuilder.java:764) 
 ~[optiq-core-0.9-drill-r18.jar:na]
 at org.eigenbase.rel.SortRel.init(SortRel.java:94) 
 ~[optiq-core-0.9-drill-r18.jar:na]
 at org.eigenbase.rel.SortRel.init(SortRel.java:59) 
 ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:78) 
 ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:1) 
 ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1011)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1102)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.relopt.volcano.AbstractConverter$ExpandConversionRule.onMatch(AbstractConverter.java:108)
  ~[optiq-core-0.9-drill-r18.jar:na]
 {code}
 This might be related to CALCITE-569 (and possibly DRILL-1978) but the stack 
 traces are different, so I am treating this as a separate issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2342) Nullability property of the view created from parquet file is not correct

2015-03-17 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-2342:

Attachment: t1.parquet

Table 't1' parquet file that was used in the query

 Nullability property of the view created from parquet file is not correct
 -

 Key: DRILL-2342
 URL: https://issues.apache.org/jira/browse/DRILL-2342
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Venki Korukanti
Priority: Critical
 Fix For: 0.9.0

 Attachments: t1.parquet


 Here is my t1 table definition:
 {code}
 message root {
   optional int32 a1;
   optional binary b1 (UTF8);
   optional int32 c1 (DATE);
 }
 {code}
 I created a view on top of it:
 {code}
 0: jdbc:drill:schema=dfs create view v1 as select cast(a1 as int), cast(b1 
 as varchar(10)), cast(c1 as date) from t1;
 +++
 | ok |  summary   |
 +++
 | true   | View 'v1' created successfully in 'dfs.aggregation' schema |
 +++
 1 row selected (0.096 seconds)
 {code}
 IS_NULLABLE says 'NO', which is incorrect.
 {code}
 0: jdbc:drill:schema=dfs describe v1;
 +-++-+
 | COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
 +-++-+
 | EXPR$0  | INTEGER| NO  |
 | EXPR$1  | VARCHAR| NO  |
 | EXPR$2  | DATE   | NO  |
 +-++-+
 3 rows selected (0.067 seconds)
 {code}
 It is dangerous potentially, because if Calcite decided to take advantage 
 over this property tomorrow and create an optimization where if column is not 
 nullable is null predicate can be dropped, query : select * from v1 where 
 x is null would return incorrect result.
 {code}
 0: jdbc:drill:schema=dfs explain plan for select * from v1 where z is null;
 +++
 |text|json|
 +++
 | 00-00Screen
 00-01  Project(x=[$0], y=[$1], z=[$2])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[IS NULL($2)])
 00-04Project(x=[CAST($2):ANY NOT NULL], y=[CAST($1):ANY NOT 
 NULL], z=[CAST($0):ANY NOT NULL])
 00-05  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/aggregation/t1]], 
 selectionRoot=/aggregation/t1, numFiles=1, columns=[`a1`, `b1`, `c1`]]])
 {code}
 It seems to me that in views column properties should be always nullable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1833) Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data

2015-03-17 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365644#comment-14365644
 ] 

Venki Korukanti commented on DRILL-1833:


We currently store MapViewName, ViewLocation in ZK. When listing views (as 
part of SHOW TABLES), we take view list from ZK store and for each entry we 
check if the view definition exists in given location. As the view list is 
empty in ZK, we don't list any views in SHOW TABLES. When creating view, we 
create it by default in workspace schema location. Also when querying we refer 
directly to FileSystem for view definition. Info in ZK is redundant as we are 
always trusting the information in FileSystem. We can remove store view 
persistent info in ZK.

One thing I must point is: In future if we support create view with custom view 
location, then it won't be visible in SHOW TABLES as SHOW TABLES only searches 
for .view.drill files in workspace root directory. This should be ok as the 
view created with custom location is considered external to schema. 

 Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping 
 out ZooKeeper data
 ---

 Key: DRILL-1833
 URL: https://issues.apache.org/jira/browse/DRILL-1833
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Information Schema
 Environment: git.commit.id.abbrev=2396670
Reporter: Xiao Meng
Assignee: Venki Korukanti
 Fix For: 0.9.0


 After wiping out the ZooKeeper data, the drillbit cannot automatically 
 register the view into INFORMATION_SCHEMA.`TABLES` even after we query the 
 view.
 For example, for a workspace dfs.tmp, there is a view file 
 `varchar_view.view.drill` under the corresponding directory '/tmp'.
 We can query:
 {code}
 select * from dfs.test.`varchar_view`
 {code}
  
 But this view still won't show up in INFORMATION_SCHEMA.`TABLES`. 
 After I recreate the view based on the contents of `varchar_view.view.drill`, 
 the view shows in the INFORMATION_SCHEMA.`TABLES`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2414) Union-All on SELECT * FROM schema-less data source will throw exception

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu updated DRILL-2414:
-
Fix Version/s: (was: 0.9.0)
   0.8.0

 Union-All on SELECT * FROM schema-less data source will throw exception
 ---

 Key: DRILL-2414
 URL: https://issues.apache.org/jira/browse/DRILL-2414
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Aman Sinha
 Fix For: 0.8.0

 Attachments: DRILL-2414.1.patch


 Union-All on SELECT * (wildcard symbol) is supported only for the cases where 
 schema (i.e., hive, view) is available. 
 For detailed design documentation, please refer to DRILL-2207.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu updated DRILL-2311:
-
Fix Version/s: (was: 0.9.0)
   0.8.0

 Create table with same columns of different case results in a 
 java.lang.IllegalStateException
 -

 Key: DRILL-2311
 URL: https://issues.apache.org/jira/browse/DRILL-2311
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 0.8.0
Reporter: Ramana Inukonda Nagaraj
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.8.0

 Attachments: DRILL-2311.1.patch


 Doing a create table with same column in different case results in a runtime 
 exception. This query should fail at planning or parsing.
 CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as 
 bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM 
 dfs.`/user/root/alltypes.json`;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2441) Throw unsupported error message in case of inequality join

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-2441.
--
   Resolution: Fixed
Fix Version/s: (was: 0.9.0)
   0.8.0

 Throw unsupported error message in case of inequality join
 --

 Key: DRILL-2441
 URL: https://issues.apache.org/jira/browse/DRILL-2441
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Victoria Markman
Assignee: Aman Sinha
 Fix For: 0.8.0

 Attachments: DRILL-2441.1.patch


 Since we don't support inequality join, the whole class of queries will throw 
 huge page long Can't plan exception
 This is a request to throw a nice error message that we throw in case of 
 cartesian join in these cases as well.
 {code} 
 select * from t1 left outer join t2  on (t1.a1 = t2.a2 and t1.b2  t2.b2);
 select * from t1 right outer join t2 on (t1.a1 = t2.a2 and t1.b2  t2.b2);
 {code}
 Example of an exception:
 {code}
 0: jdbc:drill:schema=dfs select * from t1 inner join t2 on(t1.b1  t2.b2);
 Query failed: UnsupportedRelOperatorException: This query cannot be planned 
 possibly due to either a cartesian join or an inequality join
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2002) Confusing star behavior in UNION ALL operator

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu updated DRILL-2002:
-
Fix Version/s: (was: 0.9.0)
   0.8.0

 Confusing star behavior in UNION ALL operator
 ---

 Key: DRILL-2002
 URL: https://issues.apache.org/jira/browse/DRILL-2002
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.8.0


 t1.json
 {code}
 { a1: 1 ,b1 : 1}
 { a1: 2 ,b1 : 1}
 { a1: 2 ,b1 : 2}
 { a1: 3 ,b1 : 2}
 { a1: null , b1 : 3}
 {code}
 Star in both legs of UNION ALL works:
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` union all select * from 
 `t1.json`;
 +++
 | a1 | b1 |
 +++
 | 1  | 1  |
 | 2  | 1  |
 | 2  | 2  |
 | 3  | 2  |
 | null   | 3  |
 | 1  | 1  |
 | 2  | 1  |
 | 2  | 2  |
 | 3  | 2  |
 | null   | 3  |
 +++
 10 rows selected (0.126 seconds)
 {code}
 I expected this to work in structured, but it seems that since planner has no 
 idea about meta data, error message seems reasonable:
 {code}
 0: jdbc:drill:schema=dfs select a1, b1 from `t1.json` union all select * 
 from `t1.json`;
 Query failed: Query failed: Failure validating SQL. 
 org.eigenbase.util.EigenbaseContextException: At line 1, column 47: Column 
 count mismatch in UNION ALL
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 Query below returns very confusing result. I expected it to error out like 
 the query above:
 {code}
 0: jdbc:drill:schema=dfs select a1 from `t1.json` union all select * from 
 `t1.json`;
 ++
 | a1 |
 ++
 | 1  |
 | 2  |
 | 2  |
 | 3  |
 | null   |
 | 1  |
 | 2  |
 | 2  |
 | 3  |
 | null   |
 ++
 10 rows selected (0.111 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-1833) Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data

2015-03-17 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated DRILL-1833:
---
Attachment: DRILL-1833-1.patch

 Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping 
 out ZooKeeper data
 ---

 Key: DRILL-1833
 URL: https://issues.apache.org/jira/browse/DRILL-1833
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Information Schema
 Environment: git.commit.id.abbrev=2396670
Reporter: Xiao Meng
Assignee: Venki Korukanti
 Fix For: 0.9.0

 Attachments: DRILL-1833-1.patch


 After wiping out the ZooKeeper data, the drillbit cannot automatically 
 register the view into INFORMATION_SCHEMA.`TABLES` even after we query the 
 view.
 For example, for a workspace dfs.tmp, there is a view file 
 `varchar_view.view.drill` under the corresponding directory '/tmp'.
 We can query:
 {code}
 select * from dfs.test.`varchar_view`
 {code}
  
 But this view still won't show up in INFORMATION_SCHEMA.`TABLES`. 
 After I recreate the view based on the contents of `varchar_view.view.drill`, 
 the view shows in the INFORMATION_SCHEMA.`TABLES`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2275) need implementations of sys tables for drill memory and threads profiles

2015-03-17 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-2275:
---
Assignee: Jacques Nadeau  (was: Sudheesh Katkam)

 need implementations of sys tables for drill memory and threads profiles
 

 Key: DRILL-2275
 URL: https://issues.apache.org/jira/browse/DRILL-2275
 Project: Apache Drill
  Issue Type: Task
  Components: Metadata
Reporter: Zhiyong Liu
Assignee: Jacques Nadeau
Priority: Critical
 Fix For: 0.9.0

 Attachments: DRILL-2275.1.patch.txt, DRILL-2275.2.patch.txt, 
 DRILL-2275.3.patch.txt, DRILL-2275.4.patch.txt


 In order to check drill state information, the following tables are to be 
 implemented:
 1. Memory: a query such as
 select * from sys.drillmemory;
 should return a result set like the following:
 +++--+++
 |drillbit| total_sys_memory   |heap_size | direct_alloc_memory |
 +++--+++
 | node1:port1   | 24596676k | 15200420k | 1012372k   |
 +++--+++
 | node2:port2   | 24596676k | 15200420k | 2012372k   |
 +++--+++
 2. Threads:
 For each node in a cluster, we need counts of threads of the drillbits.  A 
 query like this:
 select * from sys.drillbitthreads;
 should return a result set like the following:
 +++--+++
 |drillbit| pool_name   | total_threads | busy_threads |
 +++--+++
 | node1:port1   | pool1 | 8 | 2   |
 +++--+++
 | node2:port2   | pool2 | 10 | 5   |
 +++--+++



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2413) FileSystemPlugin refactoring: avoid sharing DrillFileSystem across schemas

2015-03-17 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated DRILL-2413:
---
Attachment: DRILL-2413-1.patch

 FileSystemPlugin refactoring: avoid sharing DrillFileSystem across schemas
 --

 Key: DRILL-2413
 URL: https://issues.apache.org/jira/browse/DRILL-2413
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Metadata, Storage - Information Schema
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 0.9.0

 Attachments: DRILL-2413-1.patch


 Currently we create one DrillFileSystem (an extension of hadoop FileSystem) 
 instance and share it across all Workspaces created for all queries, 
 FormatPlugins and FormatMatcher. Remove the shared DrillFileSystem instead 
 share the DrillFileSystem configuration and create a DrillFileSystem in each 
 Schema (WorkspaceSchema) using the current user credentials in Schema. The 
 same DrillFileSystem instances to passed to FormatPlugins and FormatMatchers 
 whenever Schemas need to access the file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365707#comment-14365707
 ] 

Sean Hsuan-Yi Chu commented on DRILL-2380:
--

Drill gave the same result as Postgres:

+---+-+
| i_manufact_id | total_sales |
+---+-+
| 930   | 1.18|
| 818   | 41.86   |
| 913   | 141.9   |
| 784   | 184.9   |
| 488   | 275.08  |
| 993   | 301.6   |
| 700   | 340.520004 |
| 895   | 802.3   |
| 766   | 839.76  |
| 858   | 859.18  |
+---+-+
10 rows selected (21.237 seconds)

 TPC-DS Query 33 and simplified variants return wrong results
 

 Key: DRILL-2380
 URL: https://issues.apache.org/jira/browse/DRILL-2380
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.9.0


 TPC-DS query 33 returns wrong results. 
 {code:sql}
 WITH ss 
  AS (SELECT i_manufact_id, 
 Sum(ss_ext_sales_price) total_sales 
  FROM   store_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ss_item_sk = i_item_sk 
 AND ss_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ss_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  cs 
  AS (SELECT i_manufact_id, 
 Sum(cs_ext_sales_price) total_sales 
  FROM   catalog_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND cs_item_sk = i_item_sk 
 AND cs_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND cs_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  ws 
  AS (SELECT i_manufact_id, 
 Sum(ws_ext_sales_price) total_sales 
  FROM   web_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ws_item_sk = i_item_sk 
 AND ws_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ws_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id) 
 SELECT i_manufact_id, 
Sum(total_sales) total_sales 
 FROM   (SELECT i_manufact_id, total_sales 
 FROM   ss 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   cs 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   ws) tmp1 
 GROUP  BY i_manufact_id 
 ORDER  BY total_sales
 LIMIT 10;
 Drill Results:
 +---+-+
 | i_manufact_id | total_sales |
 +---+-+
 | 440   | 0.12|
 | 434   | 13.16   |
 | 415   | 14.04   |
 | 449   | 15.63   |
 | 563   | 31.46   |
 | 357   | 49.50   |
 | 624   | 67.94   |
 | 192   | 74.40   |
 | 137   | 83.42   |
 | 240   | 85.26   |
 +---+-+
 10 rows selected (7.57 seconds)
 Postgres Results:
  i_manufact_id | total_sales 
 ---+-
930 |1.18
818 |   41.86
913 |  141.90
784 |  184.90
488 |  275.08
993 |  301.60
700 |  340.52
895 |  802.30
766 |  839.76
858 |  859.18
 (10 rows)
 {code}
 The following simplified variants also return wrong results:
 {code:sql}
 SELECT sum(x)
 FROM
 (SELECT ss_ext_sales_price x, ss_item_sk
 FROM  store_sales
  GROUP BY ss_item_sk, ss_ext_sales_price
 UNION ALL
 SELECT cs_ext_sales_price x, cs_item_sk
 FROM catalog_sales
 GROUP BY cs_item_sk, cs_ext_sales_price) tmp
 

[jira] [Updated] (DRILL-2309) 'null' is counted with subquery

2015-03-17 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-2309:
---
Attachment: DRILL-2309.patch

[~amansinha100] can you please review.

 'null' is counted with subquery
 ---

 Key: DRILL-2309
 URL: https://issues.apache.org/jira/browse/DRILL-2309
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.8.0
Reporter: Chun Chang
Assignee: Mehant Baid
Priority: Critical
 Fix For: 0.9.0

 Attachments: DRILL-2309.patch


 #Thu Feb 19 18:40:10 EST 2015
 git.commit.id.abbrev=1ceddff
 The following query returns correct count involving columns that contains 
 null value.
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, count(tt.nul) 
 from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by 
 tt.gbyi order by tt.gbyi;
 +++
 |gbyi|   EXPR$1   |
 +++
 | 0  | 33580  |
 | 1  | 33317  |
 | 2  | 33438  |
 | 3  | 33535  |
 | 4  | 33369  |
 | 5  | 32990  |
 | 6  | 33661  |
 | 7  | 33130  |
 | 8  | 33362  |
 | 9  | 33364  |
 | 10 | 33229  |
 | 11 | 33567  |
 | 12 | 33379  |
 | 13 | 33045  |
 | 14 | 33305  |
 +++
 {code}
 But if you add more aggregation to the query, the returned count is wrong 
 (pay attention to the last column). 
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, sum(tt.id), 
 avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from 
 `complex.json` t) tt group by tt.gbyi order by tt.gbyi;
 +++++
 |gbyi|   EXPR$1   |   EXPR$2   |   EXPR$3   |
 +++++
 | 0  | 33445554017 | 499613.0956877819 | 66943  |
 | 1  | 33209358334 | 500760.0252919893 | 66318  |
 | 2  | 33369118041 | 498091.82200273 | 66994  |
 | 3  | 33254533860 | 498696.5063226428 | 66683  |
 | 4  | 33393965595 | 501125.64656145993 | 66638  |
 | 5  | 33216885506 | 499961.32710397616 | 66439  |
 | 6  | 33380205950 | 498875.3923256599 | 66911  |
 | 7  | 33405849390 | 501093.43067788356 | 6  |
 | 8  | 33136951190 | 498458.1044031481 | 66479  |
 | 9  | 33319291474 | 499967.5392457864 | 66643  |
 | 10 | 937 | 499190.47462408233 | 66787  |
 | 11 | 33571590550 | 502095.86682194035 | 66863  |
 | 12 | 33437342090 | 501708.8141502653 | 66647  |
 | 13 | 33071800925 | 498896.453904129 | 66290  |
 | 14 | 33448664191 | 501487.4206955959 | 66699  |
 +++++
 [code}
 plan for the query returned the wrong result:
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ explain plan for select 
 tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, 
 t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi;
 +++
 |text|json|
 +++
 | 00-00Screen
 00-01  Project(gbyi=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
 00-02SingleMergeExchange(sort0=[0 ASC])
 01-01  SelectionVectorRemover
 01-02Sort(sort0=[$0], dir0=[ASC])
 01-03  Project(gbyi=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], 
 EXPR$2=[CAST(/(CastHigh(CASE(=($4, 0), null, $3)), $4)):ANY], EXPR$3=[$5])
 01-04HashAgg(group=[{0}], agg#0=[$SUM0($1)], 
 agg#1=[$SUM0($2)], agg#2=[$SUM0($3)], agg#3=[$SUM0($4)], EXPR$3=[$SUM0($5)])
 01-05  HashToRandomExchange(dist0=[[$0]])
 02-01HashAgg(group=[{0}], agg#0=[$SUM0($1)], 
 agg#1=[COUNT($1)], agg#2=[$SUM0($2)], agg#3=[COUNT($2)], EXPR$3=[COUNT()])
 02-02  Project(gbyi=[$3], id=[$2], fl=[$1], nul=[$0])
 02-03Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, 
 columns=[`gbyi`, `id`, `fl`, `nul`], 
 files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2309) 'null' is counted with subquery

2015-03-17 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-2309:
---
Attachment: (was: DRILL-2309.patch)

 'null' is counted with subquery
 ---

 Key: DRILL-2309
 URL: https://issues.apache.org/jira/browse/DRILL-2309
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.8.0
Reporter: Chun Chang
Assignee: Mehant Baid
Priority: Critical
 Fix For: 0.9.0


 #Thu Feb 19 18:40:10 EST 2015
 git.commit.id.abbrev=1ceddff
 The following query returns correct count involving columns that contains 
 null value.
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, count(tt.nul) 
 from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by 
 tt.gbyi order by tt.gbyi;
 +++
 |gbyi|   EXPR$1   |
 +++
 | 0  | 33580  |
 | 1  | 33317  |
 | 2  | 33438  |
 | 3  | 33535  |
 | 4  | 33369  |
 | 5  | 32990  |
 | 6  | 33661  |
 | 7  | 33130  |
 | 8  | 33362  |
 | 9  | 33364  |
 | 10 | 33229  |
 | 11 | 33567  |
 | 12 | 33379  |
 | 13 | 33045  |
 | 14 | 33305  |
 +++
 {code}
 But if you add more aggregation to the query, the returned count is wrong 
 (pay attention to the last column). 
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, sum(tt.id), 
 avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from 
 `complex.json` t) tt group by tt.gbyi order by tt.gbyi;
 +++++
 |gbyi|   EXPR$1   |   EXPR$2   |   EXPR$3   |
 +++++
 | 0  | 33445554017 | 499613.0956877819 | 66943  |
 | 1  | 33209358334 | 500760.0252919893 | 66318  |
 | 2  | 33369118041 | 498091.82200273 | 66994  |
 | 3  | 33254533860 | 498696.5063226428 | 66683  |
 | 4  | 33393965595 | 501125.64656145993 | 66638  |
 | 5  | 33216885506 | 499961.32710397616 | 66439  |
 | 6  | 33380205950 | 498875.3923256599 | 66911  |
 | 7  | 33405849390 | 501093.43067788356 | 6  |
 | 8  | 33136951190 | 498458.1044031481 | 66479  |
 | 9  | 33319291474 | 499967.5392457864 | 66643  |
 | 10 | 937 | 499190.47462408233 | 66787  |
 | 11 | 33571590550 | 502095.86682194035 | 66863  |
 | 12 | 33437342090 | 501708.8141502653 | 66647  |
 | 13 | 33071800925 | 498896.453904129 | 66290  |
 | 14 | 33448664191 | 501487.4206955959 | 66699  |
 +++++
 [code}
 plan for the query returned the wrong result:
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ explain plan for select 
 tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, 
 t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi;
 +++
 |text|json|
 +++
 | 00-00Screen
 00-01  Project(gbyi=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
 00-02SingleMergeExchange(sort0=[0 ASC])
 01-01  SelectionVectorRemover
 01-02Sort(sort0=[$0], dir0=[ASC])
 01-03  Project(gbyi=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], 
 EXPR$2=[CAST(/(CastHigh(CASE(=($4, 0), null, $3)), $4)):ANY], EXPR$3=[$5])
 01-04HashAgg(group=[{0}], agg#0=[$SUM0($1)], 
 agg#1=[$SUM0($2)], agg#2=[$SUM0($3)], agg#3=[$SUM0($4)], EXPR$3=[$SUM0($5)])
 01-05  HashToRandomExchange(dist0=[[$0]])
 02-01HashAgg(group=[{0}], agg#0=[$SUM0($1)], 
 agg#1=[COUNT($1)], agg#2=[$SUM0($2)], agg#3=[COUNT($2)], EXPR$3=[COUNT()])
 02-02  Project(gbyi=[$3], id=[$2], fl=[$1], nul=[$0])
 02-03Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, 
 columns=[`gbyi`, `id`, `fl`, `nul`], 
 files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2491) Fix use of injectable QueryDateTimeInfo in localtimestamp()

2015-03-17 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-2491:
---
Attachment: DRILL-2491.patch

 Fix use of injectable QueryDateTimeInfo in localtimestamp()
 ---

 Key: DRILL-2491
 URL: https://issues.apache.org/jira/browse/DRILL-2491
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 0.8.0

 Attachments: DRILL-2491.patch


 After the recent changes to remove RecordBatch from the setup() method of 
 UDF's we introduced a new injectable QueryDateTimeInfo to store the query's 
 start timestamp and timezone information. However seems like in one of the 
 UDF's (localtimestamp) this injectable was not correctly used. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-2488:
--
Attachment: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch

It turns out to be an issue with supported encoding for MergeJoin.  Merge Join 
execution operator currently does not process incoming batches with SV2 or SV4, 
so if there was a Limit or Sort below, we need to insert a 
SelectionVectorRemover below the MJ.  
Uploaded a patch with a simple fix.  [~vkorukanti] could you please review ?  I 
haven't run all tests yet..still in process.

 Wrong result on join between two subqueries with aggregation
 

 Key: DRILL-2488
 URL: https://issues.apache.org/jira/browse/DRILL-2488
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Aman Sinha
Priority: Critical
 Fix For: 0.9.0

 Attachments: 
 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch, t1.parquet


 {code}
 0: jdbc:drill:schema=dfs select * from t1;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 1  | a  | 2015-01-01 |
 | 2  | b  | 2015-01-02 |
 | 3  | c  | 2015-01-03 |
 | 4  | null   | 2015-01-04 |
 | 5  | e  | 2015-01-05 |
 | 6  | f  | 2015-01-06 |
 | 7  | g  | 2015-01-07 |
 | null   | h  | 2015-01-08 |
 | 9  | i  | null   |
 | 10 | j  | 2015-01-10 |
 ++++
 10 rows selected (0.15 seconds)
 {code}
 This result is incorrect, one row is missing
 {code}
 0: jdbc:drill:schema=dfs select * from
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq1(x1, y1)
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  inner join
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq2(x1, y1)
 . . . . . . . . . . . .  on
 . . . . . . . . . . . .  sq1.x1 = sq2.x1 and
 . . . . . . . . . . . .  sq2.y1 = sq2.y1
 . . . . . . . . . . . .  ;
 +++++
 | x1 | y1 |x10 |y10 |
 +++++
 | b  | 1  | b  | 1  |
 | c  | 1  | c  | 1  |
 | e  | 1  | e  | 1  |
 | f  | 1  | f  | 1  |
 +++++
 4 rows selected (0.28 seconds)
 {code}
 Explain plan for the wrong result:
 {code}
 00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Limit(offset=[1], fetch=[5])
 00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-09Sort(sort0=[$0], dir0=[ASC])
 00-11  StreamAgg(group=[{0, 1}])
 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-15  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 00-04Project(b10=[$0], EXPR$10=[$1])
 00-06  SelectionVectorRemover
 00-08Sort(sort0=[$0], dir0=[ASC])
 00-10  Filter(condition=[=($1, $1)])
 00-12Limit(offset=[1], fetch=[5])
 00-14  

[jira] [Commented] (DRILL-2491) Fix use of injectable QueryDateTimeInfo in localtimestamp()

2015-03-17 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366432#comment-14366432
 ] 

Jason Altekruse commented on DRILL-2491:


+1

 Fix use of injectable QueryDateTimeInfo in localtimestamp()
 ---

 Key: DRILL-2491
 URL: https://issues.apache.org/jira/browse/DRILL-2491
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 0.8.0

 Attachments: DRILL-2491.patch


 After the recent changes to remove RecordBatch from the setup() method of 
 UDF's we introduced a new injectable QueryDateTimeInfo to store the query's 
 start timestamp and timezone information. However seems like in one of the 
 UDF's (localtimestamp) this injectable was not correctly used. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366531#comment-14366531
 ] 

Venki Korukanti commented on DRILL-2488:


Looks good, +1. 

 Wrong result on join between two subqueries with aggregation
 

 Key: DRILL-2488
 URL: https://issues.apache.org/jira/browse/DRILL-2488
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Aman Sinha
Priority: Critical
 Fix For: 0.9.0

 Attachments: 
 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch, t1.parquet


 {code}
 0: jdbc:drill:schema=dfs select * from t1;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 1  | a  | 2015-01-01 |
 | 2  | b  | 2015-01-02 |
 | 3  | c  | 2015-01-03 |
 | 4  | null   | 2015-01-04 |
 | 5  | e  | 2015-01-05 |
 | 6  | f  | 2015-01-06 |
 | 7  | g  | 2015-01-07 |
 | null   | h  | 2015-01-08 |
 | 9  | i  | null   |
 | 10 | j  | 2015-01-10 |
 ++++
 10 rows selected (0.15 seconds)
 {code}
 This result is incorrect, one row is missing
 {code}
 0: jdbc:drill:schema=dfs select * from
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq1(x1, y1)
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  inner join
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq2(x1, y1)
 . . . . . . . . . . . .  on
 . . . . . . . . . . . .  sq1.x1 = sq2.x1 and
 . . . . . . . . . . . .  sq2.y1 = sq2.y1
 . . . . . . . . . . . .  ;
 +++++
 | x1 | y1 |x10 |y10 |
 +++++
 | b  | 1  | b  | 1  |
 | c  | 1  | c  | 1  |
 | e  | 1  | e  | 1  |
 | f  | 1  | f  | 1  |
 +++++
 4 rows selected (0.28 seconds)
 {code}
 Explain plan for the wrong result:
 {code}
 00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Limit(offset=[1], fetch=[5])
 00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-09Sort(sort0=[$0], dir0=[ASC])
 00-11  StreamAgg(group=[{0, 1}])
 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-15  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 00-04Project(b10=[$0], EXPR$10=[$1])
 00-06  SelectionVectorRemover
 00-08Sort(sort0=[$0], dir0=[ASC])
 00-10  Filter(condition=[=($1, $1)])
 00-12Limit(offset=[1], fetch=[5])
 00-14  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-16Sort(sort0=[$0], dir0=[ASC])
 00-17  StreamAgg(group=[{0, 1}])
 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
 dir1=[ASC])
 00-19  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 

[jira] [Updated] (DRILL-2309) Selecting count(), avg() of nullable columns causes wrong results

2015-03-17 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-2309:
---
Summary: Selecting count(), avg() of nullable columns causes wrong results  
(was: 'null' is counted with subquery)

 Selecting count(), avg() of nullable columns causes wrong results
 -

 Key: DRILL-2309
 URL: https://issues.apache.org/jira/browse/DRILL-2309
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.8.0
Reporter: Chun Chang
Assignee: Mehant Baid
Priority: Critical
 Fix For: 0.9.0

 Attachments: DRILL-2309.patch


 #Thu Feb 19 18:40:10 EST 2015
 git.commit.id.abbrev=1ceddff
 The following query returns correct count involving columns that contains 
 null value.
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, count(tt.nul) 
 from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by 
 tt.gbyi order by tt.gbyi;
 +++
 |gbyi|   EXPR$1   |
 +++
 | 0  | 33580  |
 | 1  | 33317  |
 | 2  | 33438  |
 | 3  | 33535  |
 | 4  | 33369  |
 | 5  | 32990  |
 | 6  | 33661  |
 | 7  | 33130  |
 | 8  | 33362  |
 | 9  | 33364  |
 | 10 | 33229  |
 | 11 | 33567  |
 | 12 | 33379  |
 | 13 | 33045  |
 | 14 | 33305  |
 +++
 {code}
 But if you add more aggregation to the query, the returned count is wrong 
 (pay attention to the last column). 
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, sum(tt.id), 
 avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from 
 `complex.json` t) tt group by tt.gbyi order by tt.gbyi;
 +++++
 |gbyi|   EXPR$1   |   EXPR$2   |   EXPR$3   |
 +++++
 | 0  | 33445554017 | 499613.0956877819 | 66943  |
 | 1  | 33209358334 | 500760.0252919893 | 66318  |
 | 2  | 33369118041 | 498091.82200273 | 66994  |
 | 3  | 33254533860 | 498696.5063226428 | 66683  |
 | 4  | 33393965595 | 501125.64656145993 | 66638  |
 | 5  | 33216885506 | 499961.32710397616 | 66439  |
 | 6  | 33380205950 | 498875.3923256599 | 66911  |
 | 7  | 33405849390 | 501093.43067788356 | 6  |
 | 8  | 33136951190 | 498458.1044031481 | 66479  |
 | 9  | 33319291474 | 499967.5392457864 | 66643  |
 | 10 | 937 | 499190.47462408233 | 66787  |
 | 11 | 33571590550 | 502095.86682194035 | 66863  |
 | 12 | 33437342090 | 501708.8141502653 | 66647  |
 | 13 | 33071800925 | 498896.453904129 | 66290  |
 | 14 | 33448664191 | 501487.4206955959 | 66699  |
 +++++
 [code}
 plan for the query returned the wrong result:
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ explain plan for select 
 tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, 
 t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi;
 +++
 |text|json|
 +++
 | 00-00Screen
 00-01  Project(gbyi=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
 00-02SingleMergeExchange(sort0=[0 ASC])
 01-01  SelectionVectorRemover
 01-02Sort(sort0=[$0], dir0=[ASC])
 01-03  Project(gbyi=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], 
 EXPR$2=[CAST(/(CastHigh(CASE(=($4, 0), null, $3)), $4)):ANY], EXPR$3=[$5])
 01-04HashAgg(group=[{0}], agg#0=[$SUM0($1)], 
 agg#1=[$SUM0($2)], agg#2=[$SUM0($3)], agg#3=[$SUM0($4)], EXPR$3=[$SUM0($5)])
 01-05  HashToRandomExchange(dist0=[[$0]])
 02-01HashAgg(group=[{0}], agg#0=[$SUM0($1)], 
 agg#1=[COUNT($1)], agg#2=[$SUM0($2)], agg#3=[COUNT($2)], EXPR$3=[COUNT()])
 02-02  Project(gbyi=[$3], id=[$2], fl=[$1], nul=[$0])
 02-03Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, 
 columns=[`gbyi`, `id`, `fl`, `nul`], 
 files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2309) 'null' is counted with subquery

2015-03-17 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-2309:
---
Attachment: DRILL-2309.patch

Minor update to the patch.

 'null' is counted with subquery
 ---

 Key: DRILL-2309
 URL: https://issues.apache.org/jira/browse/DRILL-2309
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.8.0
Reporter: Chun Chang
Assignee: Mehant Baid
Priority: Critical
 Fix For: 0.9.0

 Attachments: DRILL-2309.patch


 #Thu Feb 19 18:40:10 EST 2015
 git.commit.id.abbrev=1ceddff
 The following query returns correct count involving columns that contains 
 null value.
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, count(tt.nul) 
 from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by 
 tt.gbyi order by tt.gbyi;
 +++
 |gbyi|   EXPR$1   |
 +++
 | 0  | 33580  |
 | 1  | 33317  |
 | 2  | 33438  |
 | 3  | 33535  |
 | 4  | 33369  |
 | 5  | 32990  |
 | 6  | 33661  |
 | 7  | 33130  |
 | 8  | 33362  |
 | 9  | 33364  |
 | 10 | 33229  |
 | 11 | 33567  |
 | 12 | 33379  |
 | 13 | 33045  |
 | 14 | 33305  |
 +++
 {code}
 But if you add more aggregation to the query, the returned count is wrong 
 (pay attention to the last column). 
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, sum(tt.id), 
 avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from 
 `complex.json` t) tt group by tt.gbyi order by tt.gbyi;
 +++++
 |gbyi|   EXPR$1   |   EXPR$2   |   EXPR$3   |
 +++++
 | 0  | 33445554017 | 499613.0956877819 | 66943  |
 | 1  | 33209358334 | 500760.0252919893 | 66318  |
 | 2  | 33369118041 | 498091.82200273 | 66994  |
 | 3  | 33254533860 | 498696.5063226428 | 66683  |
 | 4  | 33393965595 | 501125.64656145993 | 66638  |
 | 5  | 33216885506 | 499961.32710397616 | 66439  |
 | 6  | 33380205950 | 498875.3923256599 | 66911  |
 | 7  | 33405849390 | 501093.43067788356 | 6  |
 | 8  | 33136951190 | 498458.1044031481 | 66479  |
 | 9  | 33319291474 | 499967.5392457864 | 66643  |
 | 10 | 937 | 499190.47462408233 | 66787  |
 | 11 | 33571590550 | 502095.86682194035 | 66863  |
 | 12 | 33437342090 | 501708.8141502653 | 66647  |
 | 13 | 33071800925 | 498896.453904129 | 66290  |
 | 14 | 33448664191 | 501487.4206955959 | 66699  |
 +++++
 [code}
 plan for the query returned the wrong result:
 {code}
 0: jdbc:drill:schema=dfs.drillTestDirComplexJ explain plan for select 
 tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, 
 t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi;
 +++
 |text|json|
 +++
 | 00-00Screen
 00-01  Project(gbyi=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
 00-02SingleMergeExchange(sort0=[0 ASC])
 01-01  SelectionVectorRemover
 01-02Sort(sort0=[$0], dir0=[ASC])
 01-03  Project(gbyi=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], 
 EXPR$2=[CAST(/(CastHigh(CASE(=($4, 0), null, $3)), $4)):ANY], EXPR$3=[$5])
 01-04HashAgg(group=[{0}], agg#0=[$SUM0($1)], 
 agg#1=[$SUM0($2)], agg#2=[$SUM0($3)], agg#3=[$SUM0($4)], EXPR$3=[$SUM0($5)])
 01-05  HashToRandomExchange(dist0=[[$0]])
 02-01HashAgg(group=[{0}], agg#0=[$SUM0($1)], 
 agg#1=[COUNT($1)], agg#2=[$SUM0($2)], agg#3=[COUNT($2)], EXPR$3=[COUNT()])
 02-02  Project(gbyi=[$3], id=[$2], fl=[$1], nul=[$0])
 02-03Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, 
 columns=[`gbyi`, `id`, `fl`, `nul`], 
 files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2483) Make buffer that rows are read into during execution configurable for testing purposes

2015-03-17 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366224#comment-14366224
 ] 

Victoria Markman commented on DRILL-2483:
-

Hakim (thank you) pointed me to a discussion on dev mailing list that happened 
two months ago: 
https://www.mail-archive.com/dev%40drill.apache.org/msg00551.html

 Make buffer that rows are read into during execution configurable for testing 
 purposes
 --

 Key: DRILL-2483
 URL: https://issues.apache.org/jira/browse/DRILL-2483
 Project: Apache Drill
  Issue Type: Wish
Reporter: Victoria Markman

 We've found a bug recently where if table had multiple duplicate rows and 
 duplicate rows span multiple buffers, merge join returned wrong result. Test 
 case had a table with 10,000 rows.
 The same problem could be reproduced on a much smaller data set if buffer 
 size was configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-2488:

Attachment: t1.parquet

 Wrong result on join between two subqueries with aggregation
 

 Key: DRILL-2488
 URL: https://issues.apache.org/jira/browse/DRILL-2488
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Chris Westin
Priority: Critical
 Fix For: 0.9.0

 Attachments: t1.parquet


 {code}
 0: jdbc:drill:schema=dfs select * from t1;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 1  | a  | 2015-01-01 |
 | 2  | b  | 2015-01-02 |
 | 3  | c  | 2015-01-03 |
 | 4  | null   | 2015-01-04 |
 | 5  | e  | 2015-01-05 |
 | 6  | f  | 2015-01-06 |
 | 7  | g  | 2015-01-07 |
 | null   | h  | 2015-01-08 |
 | 9  | i  | null   |
 | 10 | j  | 2015-01-10 |
 ++++
 10 rows selected (0.15 seconds)
 {code}
 This result is incorrect, one row is missing
 {code}
 0: jdbc:drill:schema=dfs select * from
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq1(x1, y1)
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  inner join
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq2(x1, y1)
 . . . . . . . . . . . .  on
 . . . . . . . . . . . .  sq1.x1 = sq2.x1 and
 . . . . . . . . . . . .  sq2.y1 = sq2.y1
 . . . . . . . . . . . .  ;
 +++++
 | x1 | y1 |x10 |y10 |
 +++++
 | b  | 1  | b  | 1  |
 | c  | 1  | c  | 1  |
 | e  | 1  | e  | 1  |
 | f  | 1  | f  | 1  |
 +++++
 4 rows selected (0.28 seconds)
 {code}
 Explain plan for the wrong result:
 {code}
 00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Limit(offset=[1], fetch=[5])
 00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-09Sort(sort0=[$0], dir0=[ASC])
 00-11  StreamAgg(group=[{0, 1}])
 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-15  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 00-04Project(b10=[$0], EXPR$10=[$1])
 00-06  SelectionVectorRemover
 00-08Sort(sort0=[$0], dir0=[ASC])
 00-10  Filter(condition=[=($1, $1)])
 00-12Limit(offset=[1], fetch=[5])
 00-14  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-16Sort(sort0=[$0], dir0=[ASC])
 00-17  StreamAgg(group=[{0, 1}])
 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
 dir1=[ASC])
 00-19  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 {code}
 If you turn off 

[jira] [Updated] (DRILL-2438) Query on views with Avg on integer column returns wrong result

2015-03-17 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated DRILL-2438:
---
Assignee: Mehant Baid  (was: Venki Korukanti)

 Query on views with Avg on integer column returns wrong result
 --

 Key: DRILL-2438
 URL: https://issues.apache.org/jira/browse/DRILL-2438
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Mehant Baid
Priority: Critical
 Fix For: 0.9.0


 Git.Commit.ID: b3bdc27 (Mar 10)
 Average on an integer column returns an (inaccurate) integer value, instead 
 of an (accurate) decimal value. 
 *The following query returns wrong results:*
 {code:sql}
  SELECT i_item_id, avg(i_manufact_id) agg1
 . . . . . . . . . . . . . . . . .  FROM item
 . . . . . . . . . . . . . . . . .  GROUP  BY i_item_id 
 . . . . . . . . . . . . . . . . .  ORDER  BY i_item_id
 . . . . . . . . . . . . . . . . .  LIMIT 5; 
 +++
 | i_item_id  |agg1|
 +++
 | AAAB | 152|
 | AAAC | 187|
 | AAAE | 251|
 | AABA | 199|
 | AABB | 636|
 +++
 5 rows selected (0.324 seconds)
 {code}
 *Postgres results:*
 {code:sql}
 # SELECT i_item_id, avg(i_manufact_id) agg1
 tpcds1_new-# FROM item
 tpcds1_new-# GROUP  BY i_item_id 
 tpcds1_new-# ORDER  BY i_item_id
 tpcds1_new-# LIMIT 5; 
 i_item_id | agg1 
 --+--
  AAAB | 152.
  AAAC | 373.
  AAAE | 251.
  AABA | 198.6667
  AABB | 636.
 (5 rows)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366331#comment-14366331
 ] 

Victoria Markman commented on DRILL-2488:
-

{code}
#Fri Mar 13 17:54:51 EDT 2015
git.commit.id.abbrev=7b4c887
{code}

 Wrong result on join between two subqueries with aggregation
 

 Key: DRILL-2488
 URL: https://issues.apache.org/jira/browse/DRILL-2488
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Chris Westin
Priority: Critical
 Fix For: 0.9.0

 Attachments: t1.parquet


 {code}
 0: jdbc:drill:schema=dfs select * from t1;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 1  | a  | 2015-01-01 |
 | 2  | b  | 2015-01-02 |
 | 3  | c  | 2015-01-03 |
 | 4  | null   | 2015-01-04 |
 | 5  | e  | 2015-01-05 |
 | 6  | f  | 2015-01-06 |
 | 7  | g  | 2015-01-07 |
 | null   | h  | 2015-01-08 |
 | 9  | i  | null   |
 | 10 | j  | 2015-01-10 |
 ++++
 10 rows selected (0.15 seconds)
 {code}
 This result is incorrect, one row is missing
 {code}
 0: jdbc:drill:schema=dfs select * from
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq1(x1, y1)
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  inner join
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq2(x1, y1)
 . . . . . . . . . . . .  on
 . . . . . . . . . . . .  sq1.x1 = sq2.x1 and
 . . . . . . . . . . . .  sq2.y1 = sq2.y1
 . . . . . . . . . . . .  ;
 +++++
 | x1 | y1 |x10 |y10 |
 +++++
 | b  | 1  | b  | 1  |
 | c  | 1  | c  | 1  |
 | e  | 1  | e  | 1  |
 | f  | 1  | f  | 1  |
 +++++
 4 rows selected (0.28 seconds)
 {code}
 Explain plan for the wrong result:
 {code}
 00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Limit(offset=[1], fetch=[5])
 00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-09Sort(sort0=[$0], dir0=[ASC])
 00-11  StreamAgg(group=[{0, 1}])
 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-15  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 00-04Project(b10=[$0], EXPR$10=[$1])
 00-06  SelectionVectorRemover
 00-08Sort(sort0=[$0], dir0=[ASC])
 00-10  Filter(condition=[=($1, $1)])
 00-12Limit(offset=[1], fetch=[5])
 00-14  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-16Sort(sort0=[$0], dir0=[ASC])
 00-17  StreamAgg(group=[{0, 1}])
 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
 dir1=[ASC])
 00-19  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 

[jira] [Created] (DRILL-2491) Fix use of injectable QueryDateTimeInfo in localtimestamp()

2015-03-17 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-2491:
--

 Summary: Fix use of injectable QueryDateTimeInfo in 
localtimestamp()
 Key: DRILL-2491
 URL: https://issues.apache.org/jira/browse/DRILL-2491
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 0.8.0


After the recent changes to remove RecordBatch from the setup() method of UDF's 
we introduced a new injectable QueryDateTimeInfo to store the query's start 
timestamp and timezone information. However seems like in one of the UDF's 
(localtimestamp) this injectable was not correctly used. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2143) Remove RecordBatch from setup method of DrillFunc interface

2015-03-17 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-2143.

Resolution: Fixed

Resolved in bff7b9ef5a9f345908aca160a97b98f6ab187708 and 
1c5decc17cf38cbf4a4119d7ca19653cb19e1b53

 Remove RecordBatch from setup method of DrillFunc interface
 ---

 Key: DRILL-2143
 URL: https://issues.apache.org/jira/browse/DRILL-2143
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Reporter: Jason Altekruse
Assignee: Jason Altekruse
 Fix For: 0.8.0

 Attachments: DRILL-2143-part1-feb-27.patch, 
 DRILL-2143-part1-feb-6.patch, DRILL-2143-part1-mar-3.patch, 
 DRILL-2143-part2-15-mar-15.patch, DRILL-2143-part2-feb-27.patch, 
 DRILL-2143-part2-feb-6.patch, DRILL-2143-part2-mar-3.patch, 
 DRILL-2143-remove-record-batch-from-udfs.patch


 Drill UDFs currently are exposed to too much system state by receiving a 
 reference to a RecordBatch in their setup method. This is not necessary as 
 all of the schema change triggered operator functionality is handled outside 
 of UDFs (the UDFS themselves are actually required to define a specific type 
 they take as input, except in the case of complex types (maps and lists)). 
 The only remaining artifact left from this interface is the date/time 
 functions that ask for the query start time or current timezone. This can be 
 provided to functions using a new injectable type, as DrillBufs are provided 
 to functions currently. For more info read here: 
 http://mail-archives.apache.org/mod_mbox/drill-dev/201501.mbox/%3ccampyv7ac_-9u4irz+5fxoenzbojctovjronn0qri4bqzf53...@mail.gmail.com%3E
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2486) Return format differences between drill odbc from interval date queries

2015-03-17 Thread Daniel Barclay (Drill) (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366197#comment-14366197
 ] 

Daniel Barclay (Drill) commented on DRILL-2486:
---

 The result from ODBC seems easier to read.

Except that the units aren't explicit in the ODBC output.

Note the SQLLine output above follows the standard format for durations from 
ISO 8601 and 
[XML Schema Part 2: Datatypes sect; 3.2.6 
duration|http://www.w3.org/TR/xmlschema-2/#duration].

 Return format differences between drill  odbc from interval date queries  
 ---

 Key: DRILL-2486
 URL: https://issues.apache.org/jira/browse/DRILL-2486
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.8.0
Reporter: Krystal
Assignee: Daniel Barclay (Drill)
Priority: Minor

 git.commit.id=ae2053d2a078a40033a140f2dfaeef802a5e8254
 The format of results from interval date queries is different between drill 
 and odbc.  Below are some examples.
 From drill:
 SELECT interval '10' day from basic limit 1;
 ++
 |   EXPR$0   |
 ++
 | P10D   |
 ++
 SELECT interval '12-11' year to month from basic limit 1;
 ++
 |   EXPR$0   |
 ++
 | P12Y11M|
 SELECT interval '1' year from basic limit 1;
 ++
 |   EXPR$0   |
 ++
 | P1Y|
 ++
 SELECT interval '9' month from basic limit 1;
 ++
 |   EXPR$0   |
 ++
 | P9M|
 ++
  From ODBC:
 SQL SELECT interval '10' day from basic limit 1
 +---+
 | EXPR$0|
 +---+
 | 10 00:00:00.00|
 +---+
 SQL SELECT interval '12-11' year to month from basic limit 1
 +--+
 | EXPR$0   |
 +--+
 | 12-11|
 +--+
 SQL SELECT interval '1' year from basic limit 1
 +--+
 | EXPR$0   |
 +--+
 | 1-00 |
 +--+
 SQL SELECT interval '9' month from basic limit 1
 +--+
 | EXPR$0   |
 +--+
 | 0-09 |
 +--+
 We should have consistent output from the 2 sources.  The result from ODBC 
 seems easier to read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2487) Schema is ignored when using : between schema and zk on sqlline connection string

2015-03-17 Thread Krystal (JIRA)
Krystal created DRILL-2487:
--

 Summary: Schema is ignored when using : between schema and zk on 
sqlline connection string 
 Key: DRILL-2487
 URL: https://issues.apache.org/jira/browse/DRILL-2487
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - CLI
Affects Versions: 0.8.0
Reporter: Krystal
Assignee: Daniel Barclay (Drill)


git.commit.id=ae2053d2a078a40033a140f2dfaeef802a5e8254

Invoking sqlline using a : between the schema and zk causes sqlline not to 
connect the specified schema.  For example:

root@qa-node113:~# /opt/drill/bin/sqlline -u 
'jdbc:drill:schema=hive:zk=10.10.100.113:5181'
touch: cannot touch `/var/log/drill/sqlline.log': No such file or directory
Drill log directory /var/log/drill does not exist or is not writable, 
defaulting to /opt/drill/log
sqlline version 1.1.6
0: jdbc:drill:schema=hive:zk=10.10.100.113:51 show tables;
Query failed: RelConversionException: No schema selected. Select a schema using 
'USE schema' command

If I put a ; between schema and zk, then sqlline connects to the specified 
schema:

root@qa-node113:~# /opt/drill/bin/sqlline -u 
'jdbc:drill:schema=hive;zk=10.10.100.113:5181'
touch: cannot touch `/var/log/drill/sqlline.log': No such file or directory
Drill log directory /var/log/drill does not exist or is not writable, 
defaulting to /opt/drill/log
sqlline version 1.1.6
0: jdbc:drill:schema=hive show tables;
+--++
| TABLE_SCHEMA | TABLE_NAME |
+--++
| hive.default | t2 |
| hive.default | episodes_partitioned |
| hive.default | store  |
| hive.default | store_sales |
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Chris Westin (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Westin updated DRILL-2488:

Fix Version/s: 0.9.0

 Wrong result on join between two subqueries with aggregation
 

 Key: DRILL-2488
 URL: https://issues.apache.org/jira/browse/DRILL-2488
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Chris Westin
Priority: Critical
 Fix For: 0.9.0


 {code}
 0: jdbc:drill:schema=dfs select * from t1;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 1  | a  | 2015-01-01 |
 | 2  | b  | 2015-01-02 |
 | 3  | c  | 2015-01-03 |
 | 4  | null   | 2015-01-04 |
 | 5  | e  | 2015-01-05 |
 | 6  | f  | 2015-01-06 |
 | 7  | g  | 2015-01-07 |
 | null   | h  | 2015-01-08 |
 | 9  | i  | null   |
 | 10 | j  | 2015-01-10 |
 ++++
 10 rows selected (0.15 seconds)
 {code}
 This result is incorrect, one row is missing
 {code}
 0: jdbc:drill:schema=dfs select * from
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq1(x1, y1)
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  inner join
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq2(x1, y1)
 . . . . . . . . . . . .  on
 . . . . . . . . . . . .  sq1.x1 = sq2.x1 and
 . . . . . . . . . . . .  sq2.y1 = sq2.y1
 . . . . . . . . . . . .  ;
 +++++
 | x1 | y1 |x10 |y10 |
 +++++
 | b  | 1  | b  | 1  |
 | c  | 1  | c  | 1  |
 | e  | 1  | e  | 1  |
 | f  | 1  | f  | 1  |
 +++++
 4 rows selected (0.28 seconds)
 {code}
 Explain plan for the wrong result:
 {code}
 00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Limit(offset=[1], fetch=[5])
 00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-09Sort(sort0=[$0], dir0=[ASC])
 00-11  StreamAgg(group=[{0, 1}])
 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-15  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 00-04Project(b10=[$0], EXPR$10=[$1])
 00-06  SelectionVectorRemover
 00-08Sort(sort0=[$0], dir0=[ASC])
 00-10  Filter(condition=[=($1, $1)])
 00-12Limit(offset=[1], fetch=[5])
 00-14  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-16Sort(sort0=[$0], dir0=[ASC])
 00-17  StreamAgg(group=[{0, 1}])
 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
 dir1=[ASC])
 00-19  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 {code}
 If you turn off merge join, query returns correct result:
 

[jira] [Commented] (DRILL-2416) Zookeeper in sqlline connection string does not override the entry from drill-override.conf

2015-03-17 Thread Krystal (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366283#comment-14366283
 ] 

Krystal commented on DRILL-2416:


The zk= scenario does default to the connection from drill-override.conf

 Zookeeper in sqlline connection string does not override the entry from 
 drill-override.conf 
 

 Key: DRILL-2416
 URL: https://issues.apache.org/jira/browse/DRILL-2416
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - CLI
Affects Versions: 0.8.0
Reporter: Krystal
Assignee: Daniel Barclay (Drill)

 git.commit.id=f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe
 On the sqlline jdbc connection string, I changed the zookeeper ip to point to 
 another cluster; however, sqlline kept connecting to the drillbits specified 
 in drill-override.conf.  I updated the drill-override.conf with the other 
 zookeeper information, then I was able to successfully connected to the 
 drillbits on a remote cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2489) Accessing Connection, Statement, PreparedStatement after they are closed should throw a SQLException

2015-03-17 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-2489:


 Summary: Accessing Connection, Statement, PreparedStatement after 
they are closed should throw a SQLException
 Key: DRILL-2489
 URL: https://issues.apache.org/jira/browse/DRILL-2489
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Rahul Challapalli
Assignee: Daniel Barclay (Drill)


git.commit.id.abbrev=7b4c887


According to JDBC spec we should throw a SQLException when we access methods on 
a closed Connection, Statement, or PreparedStatement. Drill is currently not 
doing it. 

I can raise multiple JIRA's if the developer wishes to work on them 
independently



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2490) convert confluence sql commands pages

2015-03-17 Thread Kristine Hahn (JIRA)
Kristine Hahn created DRILL-2490:


 Summary: convert confluence sql commands pages
 Key: DRILL-2490
 URL: https://issues.apache.org/jira/browse/DRILL-2490
 Project: Apache Drill
  Issue Type: Task
Reporter: Kristine Hahn






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha reassigned DRILL-2488:
-

Assignee: Aman Sinha  (was: Chris Westin)

 Wrong result on join between two subqueries with aggregation
 

 Key: DRILL-2488
 URL: https://issues.apache.org/jira/browse/DRILL-2488
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Aman Sinha
Priority: Critical
 Fix For: 0.9.0

 Attachments: t1.parquet


 {code}
 0: jdbc:drill:schema=dfs select * from t1;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 1  | a  | 2015-01-01 |
 | 2  | b  | 2015-01-02 |
 | 3  | c  | 2015-01-03 |
 | 4  | null   | 2015-01-04 |
 | 5  | e  | 2015-01-05 |
 | 6  | f  | 2015-01-06 |
 | 7  | g  | 2015-01-07 |
 | null   | h  | 2015-01-08 |
 | 9  | i  | null   |
 | 10 | j  | 2015-01-10 |
 ++++
 10 rows selected (0.15 seconds)
 {code}
 This result is incorrect, one row is missing
 {code}
 0: jdbc:drill:schema=dfs select * from
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq1(x1, y1)
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  inner join
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq2(x1, y1)
 . . . . . . . . . . . .  on
 . . . . . . . . . . . .  sq1.x1 = sq2.x1 and
 . . . . . . . . . . . .  sq2.y1 = sq2.y1
 . . . . . . . . . . . .  ;
 +++++
 | x1 | y1 |x10 |y10 |
 +++++
 | b  | 1  | b  | 1  |
 | c  | 1  | c  | 1  |
 | e  | 1  | e  | 1  |
 | f  | 1  | f  | 1  |
 +++++
 4 rows selected (0.28 seconds)
 {code}
 Explain plan for the wrong result:
 {code}
 00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Limit(offset=[1], fetch=[5])
 00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-09Sort(sort0=[$0], dir0=[ASC])
 00-11  StreamAgg(group=[{0, 1}])
 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-15  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 00-04Project(b10=[$0], EXPR$10=[$1])
 00-06  SelectionVectorRemover
 00-08Sort(sort0=[$0], dir0=[ASC])
 00-10  Filter(condition=[=($1, $1)])
 00-12Limit(offset=[1], fetch=[5])
 00-14  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-16Sort(sort0=[$0], dir0=[ASC])
 00-17  StreamAgg(group=[{0, 1}])
 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
 dir1=[ASC])
 00-19  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 {code}
 If 

[jira] [Resolved] (DRILL-2406) Fix expression interpreter to allow executing expressions at planning time

2015-03-17 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-2406.

Resolution: Fixed

Resolve din 0aa8b19d624d173da51de36aa164f3435d3366a4 and 
3f93454f014196a4da198ce012b605b70081fde0

 Fix expression interpreter to allow executing expressions at planning time
 --

 Key: DRILL-2406
 URL: https://issues.apache.org/jira/browse/DRILL-2406
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse
Priority: Critical
 Fix For: 0.8.0

 Attachments: DRILL-2406-part1-15-mar-15.patch, 
 DRILL-2406-part1-planning-time-expression-evaulutation.patch, 
 DRILL-2406-part2-15-mar-15.patch, 
 DRILL-2406-part2-planning-time-expression-evaulutation.diff, 
 DRILL-2406-part2-v2-planning-time-expression-evaulutation.patch, 
 DRILL-2406-part2-v3-planning-time-expression-evaulutation.diff


 The expression interpreter currently available in Drill cannot be used at 
 planning time, as it does not have a means to connect to the direct memory 
 allocator stored at the DrillbitContext level. To implement new rules based 
 on evaluating expressions on constants, or small datasets, such as partition 
 information this limitation must be addressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-2463) Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods

2015-03-17 Thread Daniel Barclay (Drill) (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365904#comment-14365904
 ] 

Daniel Barclay (Drill) edited comment on DRILL-2463 at 3/17/15 10:33 PM:
-

Retracted, since byte code error symptom is not from this change.  (It exists 
on the master branch.)  [Was:  

Current attempts to fix the lower layer now seem to result in a problem with 
scalar replacement manipulation of byte code.

To avoid mixing in below-JDBC cleanup/soft-bug changes with JDBC hard-bug* 
changes and delaying the JDBC bug-fix changes until lower-level problems and 
indirect requirements are understood and solved (to avoid the fate of 
DRILL-1735--having otherwise-independent changes delayed by other things), I 
think we should implement all the checks in AvaticaDrillSqlAccessor now, filing 
a Jira report and putting TODO notes in AvaticaDrillSqlAccessor to later 
implement the non-primitive-type checks in the lower layer (and then remove the 
then-redundant non-primitive-type checks from AvaticaDrillSqlAccessor).

(*Calling ResultSet.getBoolean(...) when the value is SQL NULL throws an 
exception (and ResultSet.isNull() can't be used without first calling a 
getXxx(...) method for the column).)
]


was (Author: dsbos):
Current attempts to fix the lower layer now seem to result in a problem with 
scalar replacement manipulation of byte code.

To avoid mixing in below-JDBC cleanup/soft-bug changes with JDBC hard-bug* 
changes and delaying the JDBC bug-fix changes until lower-level problems and 
indirect requirements are understood and solved (to avoid the fate of 
DRILL-1735--having otherwise-independent changes delayed by other things), I 
think we should implement all the checks in AvaticaDrillSqlAccessor now, filing 
a Jira report and putting TODO notes in AvaticaDrillSqlAccessor to later 
implement the non-primitive-type checks in the lower layer (and then remove the 
then-redundant non-primitive-type checks from AvaticaDrillSqlAccessor).

(*Calling ResultSet.getBoolean(...) when the value is SQL NULL throws an 
exception (and ResultSet.isNull() can't be used without first calling a 
getXxx(...) method for the column).)

 Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods
 -

 Key: DRILL-2463
 URL: https://issues.apache.org/jira/browse/DRILL-2463
 Project: Apache Drill
  Issue Type: Bug
Reporter: Daniel Barclay (Drill)
Assignee: Daniel Barclay (Drill)

 Fix AvaticaDrillSqlAccessor to implement mapping of SQL NULL to dummy 
 primitive values (e.g,., returning 0 for ResultSet.getInt(...)).
 Fix SqlAccessors template to implement mapping of SQL NULL to null pointers 
 (e.g., returning null from ResultSet.getString(...).)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-2488:
---

 Summary: Wrong result on join between two subqueries with 
aggregation
 Key: DRILL-2488
 URL: https://issues.apache.org/jira/browse/DRILL-2488
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Chris Westin
Priority: Critical


{code}
0: jdbc:drill:schema=dfs select * from t1;
++++
| a1 | b1 | c1 |
++++
| 1  | a  | 2015-01-01 |
| 2  | b  | 2015-01-02 |
| 3  | c  | 2015-01-03 |
| 4  | null   | 2015-01-04 |
| 5  | e  | 2015-01-05 |
| 6  | f  | 2015-01-06 |
| 7  | g  | 2015-01-07 |
| null   | h  | 2015-01-08 |
| 9  | i  | null   |
| 10 | j  | 2015-01-10 |
++++
10 rows selected (0.15 seconds)
{code}

This result is incorrect, one row is missing
{code}
0: jdbc:drill:schema=dfs select * from
. . . . . . . . . . . .  (
. . . . . . . . . . . .  select
. . . . . . . . . . . .  b1,
. . . . . . . . . . . .  count(distinct a1)
. . . . . . . . . . . .  from
. . . . . . . . . . . .  t1
. . . . . . . . . . . .  group by
. . . . . . . . . . . .  b1
. . . . . . . . . . . .  order by
. . . . . . . . . . . .  b1 limit 5 offset 1
. . . . . . . . . . . .  ) as sq1(x1, y1)
. . . . . . . . . . . . 
. . . . . . . . . . . .  inner join
. . . . . . . . . . . . 
. . . . . . . . . . . .  (
. . . . . . . . . . . .  select
. . . . . . . . . . . .  b1,
. . . . . . . . . . . .  count(distinct a1)
. . . . . . . . . . . .  from
. . . . . . . . . . . .  t1
. . . . . . . . . . . .  group by
. . . . . . . . . . . .  b1
. . . . . . . . . . . .  order by
. . . . . . . . . . . .  b1 limit 5 offset 1
. . . . . . . . . . . .  ) as sq2(x1, y1)
. . . . . . . . . . . .  on
. . . . . . . . . . . .  sq1.x1 = sq2.x1 and
. . . . . . . . . . . .  sq2.y1 = sq2.y1
. . . . . . . . . . . .  ;
+++++
| x1 | y1 |x10 |y10 |
+++++
| b  | 1  | b  | 1  |
| c  | 1  | c  | 1  |
| e  | 1  | e  | 1  |
| f  | 1  | f  | 1  |
+++++
4 rows selected (0.28 seconds)
{code}

Explain plan for the wrong result:
{code}
00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
00-05Limit(offset=[1], fetch=[5])
00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
00-09Sort(sort0=[$0], dir0=[ASC])
00-11  StreamAgg(group=[{0, 1}])
00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-15  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
`a1`]]])
00-04Project(b10=[$0], EXPR$10=[$1])
00-06  SelectionVectorRemover
00-08Sort(sort0=[$0], dir0=[ASC])
00-10  Filter(condition=[=($1, $1)])
00-12Limit(offset=[1], fetch=[5])
00-14  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
00-16Sort(sort0=[$0], dir0=[ASC])
00-17  StreamAgg(group=[{0, 1}])
00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
dir1=[ASC])
00-19  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
`a1`]]])
{code}

If you turn off merge join, query returns correct result:
{code}
0: jdbc:drill:schema=dfs select * from
. . . . . . . . . . . .  (
. . . . . . . . . . . .  select
. . . . . . . . . . . .  b1,
. . . . . . . . . . . .  count(distinct a1)
. . . . . . . . . . . .  from
. . . . . . . . . . . .  t1
. . . . . . . . . . . .  

[jira] [Commented] (DRILL-2416) Zookeeper in sqlline connection string does not override the entry from drill-override.conf

2015-03-17 Thread Krystal (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366279#comment-14366279
 ] 

Krystal commented on DRILL-2416:


I opened drill-2487 for the : vs ; issue.

For this issue, here is my drill-override.conf content:
drill.exec: {
  cluster-id: krystal-drillbits,
  zk.connect: 10.10.100.113:5181,10.10.100.114:5181,10.10.100.115:5181
}

From sqlline, connecting to the hive schema using the same zk info:

root@qa-node113:~# /opt/drill/bin/sqlline -u 
'jdbc:drill:schema=hive;zk=10.10.100.113:5181'
touch: cannot touch `/var/log/drill/sqlline.log': No such file or directory
Drill log directory /var/log/drill does not exist or is not writable, 
defaulting to /opt/drill/log
sqlline version 1.1.6
0: jdbc:drill:schema=hive show tables;
+--++
| TABLE_SCHEMA | TABLE_NAME |
+--++
| hive.default | t2 |
| hive.default | episodes_partitioned |
| hive.default | store  |
| hive.default | store_sales |
| hive.default | promotion  |
| hive.default | voter  |
| hive.default | orc_create_people_staging |
| hive.default | m7_students |

Leaving the drill-override content the same, I updated the zookeeper connection 
to point to a remote drillbit:

root@qa-node113:~# /opt/drill/bin/sqlline -u 
'jdbc:drill:schema=hive;zk=10.10.100.56:5181'
touch: cannot touch `/var/log/drill/sqlline.log': No such file or directory
Drill log directory /var/log/drill does not exist or is not writable, 
defaulting to /opt/drill/log
No DrillbitEndpoint can be found
sqlline version 1.1.6

If I then update the drill-override.conf on the client node to contain the info 
of the remote drillbit, then I was able to successfully connect:
drill.exec: {
  cluster-id: qa-node56-drillbits,
  zk.connect: 10.10.100.56:5181
}

root@qa-node113:~# /opt/drill/bin/sqlline -u 
'jdbc:drill:schema=hive;zk=10.10.100.56:5181'
touch: cannot touch `/var/log/drill/sqlline.log': No such file or directory
Drill log directory /var/log/drill does not exist or is not writable, 
defaulting to /opt/drill/log
sqlline version 1.1.6
0: jdbc:drill:schema=hive show tables;
+--++
| TABLE_SCHEMA | TABLE_NAME |
+--++
| hive.default | bit_table  |
| hive.default | stinyint_table |
| hive.default | string_table |
| hive.default | real_table |
| hive.default | interval_table |
| hive.default | binary_table |
| hive.default | emp|
| hive.default | bigint_table |



 Zookeeper in sqlline connection string does not override the entry from 
 drill-override.conf 
 

 Key: DRILL-2416
 URL: https://issues.apache.org/jira/browse/DRILL-2416
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - CLI
Affects Versions: 0.8.0
Reporter: Krystal
Assignee: Daniel Barclay (Drill)

 git.commit.id=f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe
 On the sqlline jdbc connection string, I changed the zookeeper ip to point to 
 another cluster; however, sqlline kept connecting to the drillbits specified 
 in drill-override.conf.  I updated the drill-override.conf with the other 
 zookeeper information, then I was able to successfully connected to the 
 drillbits on a remote cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-2488:

Description: 
{code}
0: jdbc:drill:schema=dfs select * from t1;
++++
| a1 | b1 | c1 |
++++
| 1  | a  | 2015-01-01 |
| 2  | b  | 2015-01-02 |
| 3  | c  | 2015-01-03 |
| 4  | null   | 2015-01-04 |
| 5  | e  | 2015-01-05 |
| 6  | f  | 2015-01-06 |
| 7  | g  | 2015-01-07 |
| null   | h  | 2015-01-08 |
| 9  | i  | null   |
| 10 | j  | 2015-01-10 |
++++
10 rows selected (0.15 seconds)
{code}

This result is incorrect, one row is missing
{code}
0: jdbc:drill:schema=dfs select * from
. . . . . . . . . . . .  (
. . . . . . . . . . . .  select
. . . . . . . . . . . .  b1,
. . . . . . . . . . . .  count(distinct a1)
. . . . . . . . . . . .  from
. . . . . . . . . . . .  t1
. . . . . . . . . . . .  group by
. . . . . . . . . . . .  b1
. . . . . . . . . . . .  order by
. . . . . . . . . . . .  b1 limit 5 offset 1
. . . . . . . . . . . .  ) as sq1(x1, y1)
. . . . . . . . . . . . 
. . . . . . . . . . . .  inner join
. . . . . . . . . . . . 
. . . . . . . . . . . .  (
. . . . . . . . . . . .  select
. . . . . . . . . . . .  b1,
. . . . . . . . . . . .  count(distinct a1)
. . . . . . . . . . . .  from
. . . . . . . . . . . .  t1
. . . . . . . . . . . .  group by
. . . . . . . . . . . .  b1
. . . . . . . . . . . .  order by
. . . . . . . . . . . .  b1 limit 5 offset 1
. . . . . . . . . . . .  ) as sq2(x1, y1)
. . . . . . . . . . . .  on
. . . . . . . . . . . .  sq1.x1 = sq2.x1 and
. . . . . . . . . . . .  sq2.y1 = sq2.y1
. . . . . . . . . . . .  ;
+++++
| x1 | y1 |x10 |y10 |
+++++
| b  | 1  | b  | 1  |
| c  | 1  | c  | 1  |
| e  | 1  | e  | 1  |
| f  | 1  | f  | 1  |
+++++
4 rows selected (0.28 seconds)
{code}

Explain plan for the wrong result:
{code}
00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
00-05Limit(offset=[1], fetch=[5])
00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
00-09Sort(sort0=[$0], dir0=[ASC])
00-11  StreamAgg(group=[{0, 1}])
00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-15  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
`a1`]]])
00-04Project(b10=[$0], EXPR$10=[$1])
00-06  SelectionVectorRemover
00-08Sort(sort0=[$0], dir0=[ASC])
00-10  Filter(condition=[=($1, $1)])
00-12Limit(offset=[1], fetch=[5])
00-14  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
00-16Sort(sort0=[$0], dir0=[ASC])
00-17  StreamAgg(group=[{0, 1}])
00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
dir1=[ASC])
00-19  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
`a1`]]])
{code}

If you turn off merge join, query returns correct result:
{code}
0: jdbc:drill:schema=dfs select * from
. . . . . . . . . . . .  (
. . . . . . . . . . . .  select
. . . . . . . . . . . .  b1,
. . . . . . . . . . . .  count(distinct a1)
. . . . . . . . . . . .  from
. . . . . . . . . . . .  t1
. . . . . . . . . . . .  group by
. . . . . . . . . . . .  b1
. . . . . . . . . . . .  order by
. . . . . . . . . . . .  b1 limit 5 offset 1
. . . . . . . . . . . .   ) as sq1(x1, y1)
. . . . . . . . . . . . 
. . . . . . . . . . . .  inner join
. . . . . . . . . . . .  (
. . . . . . . . . . . .  

[jira] [Commented] (DRILL-2438) Query on views with Avg on integer column returns wrong result

2015-03-17 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366315#comment-14366315
 ] 

Venki Korukanti commented on DRILL-2438:


This doesn't look related to views not storing nullability. For some reason an 
extra cast is inserted to cast the result to integer.

{code}
00-01  Project(i_item_id=[$0], agg1=[$1])
00-02SelectionVectorRemover
00-03  Limit(fetch=[5])
00-04SelectionVectorRemover
00-05  TopN(limit=[5])
00-06Project(i_item_id=[$0], agg1=[CAST(/(CastHigh(CASE(=($2, 
0), null, $1)), $2)):INTEGER])
00-07  HashAgg(group=[{0}], agg#0=[$SUM0($1)], 
agg#1=[COUNT($1)])
00-08Project(i_item_id=[CASE(=(ITEM($0, 1), ''), null, 
CAST(ITEM($0, 1)):VARCHAR(200) CHARACTER SET ISO-8859-1 COLLATE 
ISO-8859-1$en_US$primary)], i_manufact_id=[CASE(=(ITEM($0, 13), ''), null, 
CAST(ITEM($0, 13)):INTEGER)])
00-09  Scan(groupscan=[EasyGroupScan 
[selectionRoot=/Users/hadoop/data/scale1/item.dat, numFiles=1, 
columns=[`columns`[1], `columns`[13]], 
files=[file:/Users/hadoop/data/scale1/item.dat]]])
{code}

 Query on views with Avg on integer column returns wrong result
 --

 Key: DRILL-2438
 URL: https://issues.apache.org/jira/browse/DRILL-2438
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Mehant Baid
Priority: Critical
 Fix For: 0.9.0


 Git.Commit.ID: b3bdc27 (Mar 10)
 Average on an integer column returns an (inaccurate) integer value, instead 
 of an (accurate) decimal value. 
 *The following query returns wrong results:*
 {code:sql}
  SELECT i_item_id, avg(i_manufact_id) agg1
 . . . . . . . . . . . . . . . . .  FROM item
 . . . . . . . . . . . . . . . . .  GROUP  BY i_item_id 
 . . . . . . . . . . . . . . . . .  ORDER  BY i_item_id
 . . . . . . . . . . . . . . . . .  LIMIT 5; 
 +++
 | i_item_id  |agg1|
 +++
 | AAAB | 152|
 | AAAC | 187|
 | AAAE | 251|
 | AABA | 199|
 | AABB | 636|
 +++
 5 rows selected (0.324 seconds)
 {code}
 *Postgres results:*
 {code:sql}
 # SELECT i_item_id, avg(i_manufact_id) agg1
 tpcds1_new-# FROM item
 tpcds1_new-# GROUP  BY i_item_id 
 tpcds1_new-# ORDER  BY i_item_id
 tpcds1_new-# LIMIT 5; 
 i_item_id | agg1 
 --+--
  AAAB | 152.
  AAAC | 373.
  AAAE | 251.
  AABA | 198.6667
  AABB | 636.
 (5 rows)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366370#comment-14366370
 ] 

Aman Sinha commented on DRILL-2488:
---

Adding a simplified query that does not do COUNT(distinct) and no GROUP-BY and 
still manifests the problem: 
 {code}
select * from
  ( select b1 from dfs.`/Users/asinha/data/t1.parquet` order by b1 limit 5 
offset 3) as sq1(x1)
 inner join 
  ( select b1 from dfs.`/Users/asinha/data/t1.parquet` order by b1 limit 5 
offset 3) as sq2(x1)
  on sq1.x1 = sq2.x1
 ;
{code}
With HashJoin plan, this produces 5 rows (correct).  
With MergeJoin plan, this produces 2 rows (wrong).   However, I don't think 
this is an issue with MergeJoin; it seems to be related to OFFSET.  Removing 
the offset produces correct results and changing it produces different wrong 
results.  I will investigate some more. 

 Wrong result on join between two subqueries with aggregation
 

 Key: DRILL-2488
 URL: https://issues.apache.org/jira/browse/DRILL-2488
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Chris Westin
Priority: Critical
 Fix For: 0.9.0

 Attachments: t1.parquet


 {code}
 0: jdbc:drill:schema=dfs select * from t1;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 1  | a  | 2015-01-01 |
 | 2  | b  | 2015-01-02 |
 | 3  | c  | 2015-01-03 |
 | 4  | null   | 2015-01-04 |
 | 5  | e  | 2015-01-05 |
 | 6  | f  | 2015-01-06 |
 | 7  | g  | 2015-01-07 |
 | null   | h  | 2015-01-08 |
 | 9  | i  | null   |
 | 10 | j  | 2015-01-10 |
 ++++
 10 rows selected (0.15 seconds)
 {code}
 This result is incorrect, one row is missing
 {code}
 0: jdbc:drill:schema=dfs select * from
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq1(x1, y1)
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  inner join
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq2(x1, y1)
 . . . . . . . . . . . .  on
 . . . . . . . . . . . .  sq1.x1 = sq2.x1 and
 . . . . . . . . . . . .  sq2.y1 = sq2.y1
 . . . . . . . . . . . .  ;
 +++++
 | x1 | y1 |x10 |y10 |
 +++++
 | b  | 1  | b  | 1  |
 | c  | 1  | c  | 1  |
 | e  | 1  | e  | 1  |
 | f  | 1  | f  | 1  |
 +++++
 4 rows selected (0.28 seconds)
 {code}
 Explain plan for the wrong result:
 {code}
 00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Limit(offset=[1], fetch=[5])
 00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-09Sort(sort0=[$0], dir0=[ASC])
 00-11  StreamAgg(group=[{0, 1}])
 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-15  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 00-04Project(b10=[$0], EXPR$10=[$1])
 00-06   

[jira] [Comment Edited] (DRILL-1833) Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data

2015-03-17 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365644#comment-14365644
 ] 

Venki Korukanti edited comment on DRILL-1833 at 3/17/15 5:50 PM:
-

We currently store MapViewName, ViewLocation in ZK. When listing views (as 
part of SHOW TABLES), we take view list from ZK store and for each entry we 
check if the view definition exists in given location. As the view list is 
empty in ZK, we don't list any views in SHOW TABLES. When creating view, we 
create it by default in workspace schema location. Also when querying we refer 
directly to FileSystem for view definition. Info in ZK is redundant as we are 
always trusting the information in FileSystem. We can remove store view 
persistent info in ZK.

One thing I must point out is: In future if we support create view with custom 
view location, then it won't be visible in SHOW TABLES as SHOW TABLES only 
searches for .view.drill files in workspace root directory. This should be ok 
as the view created with custom location is considered external to schema. 


was (Author: vkorukanti):
We currently store MapViewName, ViewLocation in ZK. When listing views (as 
part of SHOW TABLES), we take view list from ZK store and for each entry we 
check if the view definition exists in given location. As the view list is 
empty in ZK, we don't list any views in SHOW TABLES. When creating view, we 
create it by default in workspace schema location. Also when querying we refer 
directly to FileSystem for view definition. Info in ZK is redundant as we are 
always trusting the information in FileSystem. We can remove store view 
persistent info in ZK.

One thing I must point is: In future if we support create view with custom view 
location, then it won't be visible in SHOW TABLES as SHOW TABLES only searches 
for .view.drill files in workspace root directory. This should be ok as the 
view created with custom location is considered external to schema. 

 Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping 
 out ZooKeeper data
 ---

 Key: DRILL-1833
 URL: https://issues.apache.org/jira/browse/DRILL-1833
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Information Schema
 Environment: git.commit.id.abbrev=2396670
Reporter: Xiao Meng
Assignee: Venki Korukanti
 Fix For: 0.9.0

 Attachments: DRILL-1833-1.patch


 After wiping out the ZooKeeper data, the drillbit cannot automatically 
 register the view into INFORMATION_SCHEMA.`TABLES` even after we query the 
 view.
 For example, for a workspace dfs.tmp, there is a view file 
 `varchar_view.view.drill` under the corresponding directory '/tmp'.
 We can query:
 {code}
 select * from dfs.test.`varchar_view`
 {code}
  
 But this view still won't show up in INFORMATION_SCHEMA.`TABLES`. 
 After I recreate the view based on the contents of `varchar_view.view.drill`, 
 the view shows in the INFORMATION_SCHEMA.`TABLES`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1833) Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data

2015-03-17 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365665#comment-14365665
 ] 

Venki Korukanti commented on DRILL-1833:


RB Link: https://reviews.apache.org/r/32165/

 Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping 
 out ZooKeeper data
 ---

 Key: DRILL-1833
 URL: https://issues.apache.org/jira/browse/DRILL-1833
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Information Schema
 Environment: git.commit.id.abbrev=2396670
Reporter: Xiao Meng
Assignee: Venki Korukanti
 Fix For: 0.9.0

 Attachments: DRILL-1833-1.patch


 After wiping out the ZooKeeper data, the drillbit cannot automatically 
 register the view into INFORMATION_SCHEMA.`TABLES` even after we query the 
 view.
 For example, for a workspace dfs.tmp, there is a view file 
 `varchar_view.view.drill` under the corresponding directory '/tmp'.
 We can query:
 {code}
 select * from dfs.test.`varchar_view`
 {code}
  
 But this view still won't show up in INFORMATION_SCHEMA.`TABLES`. 
 After I recreate the view based on the contents of `varchar_view.view.drill`, 
 the view shows in the INFORMATION_SCHEMA.`TABLES`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-2380.
--
   Resolution: Fixed
Fix Version/s: (was: 0.9.0)
   0.8.0

 TPC-DS Query 33 and simplified variants return wrong results
 

 Key: DRILL-2380
 URL: https://issues.apache.org/jira/browse/DRILL-2380
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Fix For: 0.8.0


 TPC-DS query 33 returns wrong results. 
 {code:sql}
 WITH ss 
  AS (SELECT i_manufact_id, 
 Sum(ss_ext_sales_price) total_sales 
  FROM   store_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ss_item_sk = i_item_sk 
 AND ss_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ss_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  cs 
  AS (SELECT i_manufact_id, 
 Sum(cs_ext_sales_price) total_sales 
  FROM   catalog_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND cs_item_sk = i_item_sk 
 AND cs_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND cs_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id), 
  ws 
  AS (SELECT i_manufact_id, 
 Sum(ws_ext_sales_price) total_sales 
  FROM   web_sales, 
 date_dim, 
 customer_address, 
 item 
  WHERE  i_manufact_id IN (SELECT i_manufact_id 
   FROM   item 
   WHERE  i_category IN ( 'Books' )) 
 AND ws_item_sk = i_item_sk 
 AND ws_sold_date_sk = d_date_sk 
 AND d_year = 1999 
 AND d_moy = 3 
 AND ws_bill_addr_sk = ca_address_sk 
 AND ca_gmt_offset = -5 
  GROUP  BY i_manufact_id) 
 SELECT i_manufact_id, 
Sum(total_sales) total_sales 
 FROM   (SELECT i_manufact_id, total_sales 
 FROM   ss 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   cs 
 UNION ALL 
 SELECT i_manufact_id, total_sales
 FROM   ws) tmp1 
 GROUP  BY i_manufact_id 
 ORDER  BY total_sales
 LIMIT 10;
 Drill Results:
 +---+-+
 | i_manufact_id | total_sales |
 +---+-+
 | 440   | 0.12|
 | 434   | 13.16   |
 | 415   | 14.04   |
 | 449   | 15.63   |
 | 563   | 31.46   |
 | 357   | 49.50   |
 | 624   | 67.94   |
 | 192   | 74.40   |
 | 137   | 83.42   |
 | 240   | 85.26   |
 +---+-+
 10 rows selected (7.57 seconds)
 Postgres Results:
  i_manufact_id | total_sales 
 ---+-
930 |1.18
818 |   41.86
913 |  141.90
784 |  184.90
488 |  275.08
993 |  301.60
700 |  340.52
895 |  802.30
766 |  839.76
858 |  859.18
 (10 rows)
 {code}
 The following simplified variants also return wrong results:
 {code:sql}
 SELECT sum(x)
 FROM
 (SELECT ss_ext_sales_price x, ss_item_sk
 FROM  store_sales
  GROUP BY ss_item_sk, ss_ext_sales_price
 UNION ALL
 SELECT cs_ext_sales_price x, cs_item_sk
 FROM catalog_sales
 GROUP BY cs_item_sk, cs_ext_sales_price) tmp
 GROUP BY x
 LIMIT 10;
 Drill Results:
 ++
 |   EXPR$0   |
 ++
 | 14141.40   |
 | 28060.00   |
 | 30912.70   |
 | 43706.88   |
 | 38267.64   |
 | 10173.00   |
 | 37829.25   |
 | 5349.50|
 | 107515.80  |
 | 4440.84|
 ++
 10 rows selected (14.435 seconds)
 Postgres Results:
sum
 --   
  45234.00
   5735.31
   2275.60
   6921.32
   2590.46
   6615.09
  14080.77
  24819.76
  25127.20
 (10 rows)
 SELECT sum(x)
 FROM
 (SELECT 

[jira] [Created] (DRILL-2483) Make buffer that rows are read into during execution configurable for testing purposes

2015-03-17 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-2483:
---

 Summary: Make buffer that rows are read into during execution 
configurable for testing purposes
 Key: DRILL-2483
 URL: https://issues.apache.org/jira/browse/DRILL-2483
 Project: Apache Drill
  Issue Type: Wish
Reporter: Victoria Markman


We've found a bug recently where if table had multiple duplicate rows and 
duplicate rows span multiple buffers, merge join returned wrong result. Test 
case had a table with 10,000 rows.

The same problem could be reproduced on a much smaller data set if buffer size 
was configurable.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-2482) JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError

2015-03-17 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365965#comment-14365965
 ] 

Rahul Challapalli edited comment on DRILL-2482 at 3/17/15 8:14 PM:
---

It is of type NVARCHAR. I was just casual in the description when I used varchar

I did not find the org.apache.hadoop.io.Text class file in the jar. Also there 
is no 'hadoop' folder under 'org/apache' itself.
BTW, I am working with the jar file that you provided and not off master


was (Author: rkins):
It is of type NVARCHAR.

I did not find the org.apache.hadoop.io.Text class file in the jar. Also there 
is no 'hadoop' folder under 'org/apache' itself.
BTW, I am working with the jar file that you provided and not off master

 JDBC : calling getObject when the actual column type is 'NVARCHAR' results in 
 NoClassDefFoundError
 --

 Key: DRILL-2482
 URL: https://issues.apache.org/jira/browse/DRILL-2482
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Rahul Challapalli
Assignee: Daniel Barclay (Drill)

 git.commit.id.abbrev=7b4c887
 I tried to call getObject(i) on a column which is of type varchar, drill 
 failed with the below error :
 {code}
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/io/Text
   at 
 org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:407)
   at 
 org.apache.drill.exec.vector.NullableVarCharVector$Accessor.getObject(NullableVarCharVector.java:386)
   at 
 org.apache.drill.exec.vector.accessor.NullableVarCharAccessor.getObject(NullableVarCharAccessor.java:98)
   at 
 org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:137)
   at 
 org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:136)
   at 
 net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
   at Dummy.testComplexQuery(Dummy.java:94)
   at Dummy.main(Dummy.java:30)
 Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Text
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   ... 8 more
 {code}
 When the underlying type is a primitive, the getObject call succeeds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2484) Document CASE expression

2015-03-17 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-2484:
---

 Summary: Document CASE expression
 Key: DRILL-2484
 URL: https://issues.apache.org/jira/browse/DRILL-2484
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Victoria Markman
Assignee: Bridget Bevens


Case expression is not documented, we support searched case:

CASE
WHEN boolean-expression THEN
  statements
  [ WHEN boolean-expression THEN
  statements
... ]
  [ ELSE
  statements ]
END CASE;

See postgres page for example: 
http://www.postgresql.org/docs/9.1/static/plpgsql-control-structures.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2180) Star is not expanded when being used with flatten

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365974#comment-14365974
 ] 

Sean Hsuan-Yi Chu commented on DRILL-2180:
--

[~mehant], can you review it? 

 Star is not expanded when being used with flatten
 -

 Key: DRILL-2180
 URL: https://issues.apache.org/jira/browse/DRILL-2180
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Mehant Baid
Priority: Critical
 Fix For: 0.9.0

 Attachments: DRILL-2180.1.patch


 For example,
 select *, flatten(j.topping) tt  +
   from dfs_test.`%s` j 
 (using the same data set in DRILL-2012)
 * tt
 null  {id:5001,type:None}
 null  {id:5002,type:Glazed}
 null  {id:5005,type:Sugar}
 null  {id:5007,type:Powdered Sugar}
 null  {id:5006,type:Chocolate with Sprinkles}
 null  {id:5003,type:Chocolate}
 null  {id:5004,type:Maple}
 Note that the first column is messed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2485) Configuration parameters need to be named consistently

2015-03-17 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-2485:
-

 Summary: Configuration parameters need to be named consistently
 Key: DRILL-2485
 URL: https://issues.apache.org/jira/browse/DRILL-2485
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Khurram Faraaz
Assignee: Jinfeng Ni
Priority: Minor


All existing configuration parameters need to be named consistently, the below 
two configuration parameters are not named using the same format as other 
config options are.

drill.exec.functions.cast_empty_string_to_null - accepts a string input
drill.exec.storage.file.partition.column.label - accepts either true/false.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-1943) Handle aliases and column names that differ in case only

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-1943.
--
Resolution: Fixed

Resolved in Commit#: ae2053d2a078a40033a140f2dfaeef802a5e8254

 Handle aliases and column names that differ in case only
 

 Key: DRILL-1943
 URL: https://issues.apache.org/jira/browse/DRILL-1943
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Parth Chandra
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.9.0


 1) Consider the query 
   select a, a from foo.
 For this query we return the columns a and a0.
 For the query 
   select a, A from foo
 we return only one column and also leak memory. (see DRILL-1911).
 The same behaviour exists if the query uses aliases. This is not correct. 
 Aliases are explicitly specified names to remove ambiguity in column names 
 and should be unique (ignoring case).
 A query like :
   select A as a1, B as A1 from foo 
 should give a syntax error.
 This should be the behaviour in subqueries, view creation and CTAS queries as 
 well.
 2) If a subquery (or view) has column names that are different only in case, 
 the use of the subquery or view should result in ann error if the top level 
 query references the ambiguous column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2482) JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError

2015-03-17 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365965#comment-14365965
 ] 

Rahul Challapalli commented on DRILL-2482:
--

It is of type NVARCHAR.

I did not find the org.apache.hadoop.io.Text class file in the jar. Also there 
is no 'hadoop' folder under 'org/apache' itself.
BTW, I am working with the jar file that you provided and not off master

 JDBC : calling getObject when the actual column type is 'NVARCHAR' results in 
 NoClassDefFoundError
 --

 Key: DRILL-2482
 URL: https://issues.apache.org/jira/browse/DRILL-2482
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Rahul Challapalli
Assignee: Daniel Barclay (Drill)

 git.commit.id.abbrev=7b4c887
 I tried to call getObject(i) on a column which is of type varchar, drill 
 failed with the below error :
 {code}
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/io/Text
   at 
 org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:407)
   at 
 org.apache.drill.exec.vector.NullableVarCharVector$Accessor.getObject(NullableVarCharVector.java:386)
   at 
 org.apache.drill.exec.vector.accessor.NullableVarCharAccessor.getObject(NullableVarCharAccessor.java:98)
   at 
 org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:137)
   at 
 org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:136)
   at 
 net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
   at Dummy.testComplexQuery(Dummy.java:94)
   at Dummy.main(Dummy.java:30)
 Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Text
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   ... 8 more
 {code}
 When the underlying type is a primitive, the getObject call succeeds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2482) JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError

2015-03-17 Thread Daniel Barclay (Drill) (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365919#comment-14365919
 ] 

Daniel Barclay (Drill) commented on DRILL-2482:
---

That's weird--I've seen org.apache.hadoop.io.Text objects in tracing proxy 
output (so the class was found and loaded in that case).

Can you check whether that class--or one with a name ending like that but 
starting with a different, probably added, package--exists in the Drill Jar 
file you're using?


Also, is the type NVARCHAR (per your title) or VARCHAR (per your description)?
(Or:  Where does it seem to be NVARCHAR() and where does it seem to be VARCHAR? 
(I've got a pending patch for an NVARCHAR that should be VARCHAR.))



 JDBC : calling getObject when the actual column type is 'NVARCHAR' results in 
 NoClassDefFoundError
 --

 Key: DRILL-2482
 URL: https://issues.apache.org/jira/browse/DRILL-2482
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Rahul Challapalli
Assignee: Daniel Barclay (Drill)

 git.commit.id.abbrev=7b4c887
 I tried to call getObject(i) on a column which is of type varchar, drill 
 failed with the below error :
 {code}
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/io/Text
   at 
 org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:407)
   at 
 org.apache.drill.exec.vector.NullableVarCharVector$Accessor.getObject(NullableVarCharVector.java:386)
   at 
 org.apache.drill.exec.vector.accessor.NullableVarCharAccessor.getObject(NullableVarCharAccessor.java:98)
   at 
 org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:137)
   at 
 org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:136)
   at 
 net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
   at Dummy.testComplexQuery(Dummy.java:94)
   at Dummy.main(Dummy.java:30)
 Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Text
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   ... 8 more
 {code}
 When the underlying type is a primitive, the getObject call succeeds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-1911) Querying same field multiple times with different case would hit memory leak and return incorrect result.

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-1911.
--
Resolution: Fixed

 Querying same field multiple times with different case would hit memory leak 
 and return incorrect result. 
 --

 Key: DRILL-1911
 URL: https://issues.apache.org/jira/browse/DRILL-1911
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Jinfeng Ni
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.8.0


 git.commit.id.abbrev=309e1be
 If query the same field twice, with different case, Drill will throw memory 
 assertion error. 
  select employee_id, Employee_id from cp.`employee.json` limit 2;
 +-+
 | employee_id |
 +-+
 | 1   |
 | 2   |
 Query failed: Query failed: Failure while running fragment., Attempted to 
 close accountor with 2 buffer(s) still allocatedfor QueryId: 
 2b5cc8eb-2817-aadb-e0fa-49272796592a, MajorFragmentId: 0, MinorFragmentId: 0.
  Total 1 allocation(s) of byte size(s): 4096, at stack location:
   
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:212)
   
 org.apache.drill.exec.vector.UInt1Vector.allocateNewSafe(UInt1Vector.java:137)
   
 org.apache.drill.exec.vector.NullableBigIntVector.allocateNewSafe(NullableBigIntVector.java:173)
   
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doAlloc(ProjectRecordBatch.java:229)
   
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:167)
   
 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
   
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132)
   
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
   
 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
   
 org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67)
   
 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:97)
   
 org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57)
   
 org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:114)
   
 org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
   
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:744)
 Also, notice that the query result only contains one field; the second field 
 is missing. 
 The plan looks fine.
 Drill Physical : 
 00-00Screen: rowcount = 463.0, cumulative cost = {1900.3 rows, 996.3 cpu, 
 0.0 io, 0.0 network, 0.0 memory}, id = 103
 00-01  Project(employee_id=[$0], Employee_id=[$1]): rowcount = 463.0, 
 cumulative cost = {1854.0 rows, 950.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
 id = 102
 00-02SelectionVectorRemover: rowcount = 463.0, cumulative cost = 
 {1391.0 rows, 942.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 101
 00-03  Limit(fetch=[2]): rowcount = 463.0, cumulative cost = {928.0 
 rows, 479.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 100
 00-04Project(employee_id=[$0], Employee_id=[$0]): rowcount = 
 463.0, cumulative cost = {926.0 rows, 471.0 cpu, 0.0 io, 0.0 network, 0.0 
 memory}, id = 99
 00-05  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/employee.json, numFiles=1, columns=[`employee_id`], 
 files=[/employee.json]]]): rowcount = 463.0, cumulative cost = {463.0 rows, 
 463.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 98



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-1943) Handle aliases and column names that differ in case only

2015-03-17 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu updated DRILL-1943:
-
Fix Version/s: (was: 0.9.0)
   0.8.0

 Handle aliases and column names that differ in case only
 

 Key: DRILL-1943
 URL: https://issues.apache.org/jira/browse/DRILL-1943
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Parth Chandra
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.8.0


 1) Consider the query 
   select a, a from foo.
 For this query we return the columns a and a0.
 For the query 
   select a, A from foo
 we return only one column and also leak memory. (see DRILL-1911).
 The same behaviour exists if the query uses aliases. This is not correct. 
 Aliases are explicitly specified names to remove ambiguity in column names 
 and should be unique (ignoring case).
 A query like :
   select A as a1, B as A1 from foo 
 should give a syntax error.
 This should be the behaviour in subqueries, view creation and CTAS queries as 
 well.
 2) If a subquery (or view) has column names that are different only in case, 
 the use of the subquery or view should result in ann error if the top level 
 query references the ambiguous column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2473) Set query timezone at session level

2015-03-17 Thread Andries Engelbrecht (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365261#comment-14365261
 ] 

Andries Engelbrecht commented on DRILL-2473:


Need to think if we want connection or session level, as an application may 
establish a single connection but serve multiple users from different timezones.

 Set query timezone at session level
 ---

 Key: DRILL-2473
 URL: https://issues.apache.org/jira/browse/DRILL-2473
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning  Optimization
Affects Versions: Future
Reporter: Andries Engelbrecht
Assignee: Jinfeng Ni

 Ability to set the user timezone for queries at session level to allow 
 different users querying the same data form different timezones to localize 
 the results to the desired timezone.
 Allowance for DST where applicable should be incorporated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2429) Update Supported Date/Time Data Type Formats doc

2015-03-17 Thread Kristine Hahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kristine Hahn resolved DRILL-2429.
--
Resolution: Fixed

committed  https://reviews.apache.org/r/32138/

 Update Supported Date/Time Data Type Formats doc
 

 Key: DRILL-2429
 URL: https://issues.apache.org/jira/browse/DRILL-2429
 Project: Apache Drill
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.7.0
Reporter: Kristine Hahn
Assignee: Kristine Hahn
 Fix For: 0.8.0


 Test/revise/update Supported Date/Time Data Type Formats. Fold in review 
 comments of other sections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2275) need implementations of sys tables for drill memory and threads profiles

2015-03-17 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-2275:
---
Attachment: DRILL-2275.4.patch.txt

 need implementations of sys tables for drill memory and threads profiles
 

 Key: DRILL-2275
 URL: https://issues.apache.org/jira/browse/DRILL-2275
 Project: Apache Drill
  Issue Type: Task
  Components: Metadata
Reporter: Zhiyong Liu
Assignee: Sudheesh Katkam
Priority: Critical
 Fix For: 0.9.0

 Attachments: DRILL-2275.1.patch.txt, DRILL-2275.2.patch.txt, 
 DRILL-2275.3.patch.txt, DRILL-2275.4.patch.txt


 In order to check drill state information, the following tables are to be 
 implemented:
 1. Memory: a query such as
 select * from sys.drillmemory;
 should return a result set like the following:
 +++--+++
 |drillbit| total_sys_memory   |heap_size | direct_alloc_memory |
 +++--+++
 | node1:port1   | 24596676k | 15200420k | 1012372k   |
 +++--+++
 | node2:port2   | 24596676k | 15200420k | 2012372k   |
 +++--+++
 2. Threads:
 For each node in a cluster, we need counts of threads of the drillbits.  A 
 query like this:
 select * from sys.drillbitthreads;
 should return a result set like the following:
 +++--+++
 |drillbit| pool_name   | total_threads | busy_threads |
 +++--+++
 | node1:port1   | pool1 | 8 | 2   |
 +++--+++
 | node2:port2   | pool2 | 10 | 5   |
 +++--+++



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2397) Enhance SQL Ref Data Types docs

2015-03-17 Thread Kristine Hahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kristine Hahn resolved DRILL-2397.
--
Resolution: Fixed

committed: df1b7e5a9397b02a880230e6dc51a07f2b1ff997

 Enhance SQL Ref Data Types docs
 ---

 Key: DRILL-2397
 URL: https://issues.apache.org/jira/browse/DRILL-2397
 Project: Apache Drill
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.7.0
Reporter: Kristine Hahn
Assignee: Kristine Hahn
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-2481) Querying individual column from view results in AssertionError

2015-03-17 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-2481:
-

 Summary: Querying individual column from view results in 
AssertionError
 Key: DRILL-2481
 URL: https://issues.apache.org/jira/browse/DRILL-2481
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Khurram Faraaz
Assignee: Jinfeng Ni


Querying an individual column from a view results in an AssertionError
Data used was from a csv file, its content was a single row (pls see below)

1,John Doe,HR,5000,Software Engineer 

{code }

0: jdbc:drill: use dfs.tmp;
+++
| ok |  summary   |
+++
| true   | Default schema changed to 'dfs.tmp' |
+++
1 row selected (0.188 seconds)

0: jdbc:drill: create view v1 as select * from `employee.csv` union all select 
* from `employee.csv`;
+++
| ok |  summary   |
+++
| true   | View 'v1' created successfully in 'dfs.tmp' schema |
+++
1 row selected (0.073 seconds)
0: jdbc:drill: create view v2 as select * from `employee.csv` union all select 
* from `employee.csv`;
+++
| ok |  summary   |
+++
| true   | View 'v2' created successfully in 'dfs.tmp' schema |
+++
1 row selected (0.046 seconds)
0: jdbc:drill: select * from v1;
++
|  columns   |
++
| [1,John Doe,HR,5000,Software Engineer] |
| [1,John Doe,HR,5000,Software Engineer] |
++
2 rows selected (0.087 seconds)
0: jdbc:drill: select * from v2;
++
|  columns   |
++
| [1,John Doe,HR,5000,Software Engineer] |
| [1,John Doe,HR,5000,Software Engineer] |
++
2 rows selected (0.075 seconds)


0: jdbc:drill: describe v1;
+-++-+
| COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
+-++-+
| *   | ANY| NO  |
+-++-+
1 row selected (0.084 seconds)
0: jdbc:drill: describe v2;
+-++-+
| COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
+-++-+
| *   | ANY| NO  |
+-++-+
1 row selected (0.083 seconds)

0: jdbc:drill: select columns[0] from v1;
Query failed: AssertionError: ANY

Error: exception while executing query: Failure while executing query. 
(state=,code=0)

{code}

{code}
Stack trace from driblet.log

2015-03-17 16:40:43,176 [2af7a6f3-bbcf-3a34-dfae-5b5bb11ff4a9:foreman] INFO  
o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 using 
1 threads. Time: 1ms total, 1.060851ms avg, 1ms max.
2015-03-17 16:40:43,178 [2af7a6f3-bbcf-3a34-dfae-5b5bb11ff4a9:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - State change requested.  PENDING -- 
FAILED
org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
during fragment initialization: ANY
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:195) 
[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_75]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_75]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
Caused by: java.lang.AssertionError: ANY
at 
org.eigenbase.reltype.RelDataTypeImpl.getFieldCount(RelDataTypeImpl.java:114) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.relopt.RelOptUtil$2.size(RelOptUtil.java:143) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitInputRef(RexChecker.java:111) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitInputRef(RexChecker.java:55) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexInputRef.accept(RexInputRef.java:103) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:136) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:55) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexCall.accept(RexCall.java:106) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:136) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:55) 
~[optiq-core-0.9-drill-r20.jar:na]
at org.eigenbase.rex.RexCall.accept(RexCall.java:106) 
~[optiq-core-0.9-drill-r20.jar:na]
at 

[jira] [Updated] (DRILL-2358) Ensure DrillScanRel differentiates skip-all, scan-all scan-some in a backward compatible fashion

2015-03-17 Thread Hanifi Gunes (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes updated DRILL-2358:

Assignee: Aman Sinha  (was: Rahul Challapalli)

 Ensure DrillScanRel differentiates skip-all, scan-all  scan-some in a 
 backward compatible fashion
 --

 Key: DRILL-2358
 URL: https://issues.apache.org/jira/browse/DRILL-2358
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Query Planning  Optimization
Reporter: Hanifi Gunes
Assignee: Aman Sinha
 Fix For: 1.0.0


 This subtask proposes to change DrillScanRel so that it will understand  
 relay skipped list of columns, if any, to readers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)