[jira] [Created] (DRILL-4102) Only one row found in a JSON document that contains multiple items.

2015-11-17 Thread aditya menon (JIRA)
aditya menon created DRILL-4102:
---

 Summary: Only one row found in a JSON document that contains 
multiple items.
 Key: DRILL-4102
 URL: https://issues.apache.org/jira/browse/DRILL-4102
 Project: Apache Drill
  Issue Type: Bug
 Environment: OS X, Drill embedded, v1.1.0 installed via HomeBrew
Reporter: aditya menon


I tried to analyse a JSON file that had the following (sample) structure:

```
{
"Key1": {
  "htmltags": ""
},
"Key2": {
  "htmltags": ""
},
"Key3": {
  "htmltags": ""
}
}
```

(Apologies for the obfuscation, I am unable to publish the original dataset. 
But the structure is exactly the same. Note especially how the keys and other 
data points *differ* in some places, and remain identical in others.)

When I run a `SELECT * FROM DataFile.json` what I get is a single row listed 
under three columns: `""` [i.e., only the entry `Key1.htmltags`] .

Ideally, I should see three rows, each with entries from Key1..Key3, listed 
under the correct respective column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4103) Add additional metadata to Parquet files generated by Drill

2015-11-17 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-4103:
-

 Summary: Add additional metadata to Parquet files generated by 
Drill
 Key: DRILL-4103
 URL: https://issues.apache.org/jira/browse/DRILL-4103
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Parquet
Reporter: Jacques Nadeau
Assignee: Julien Le Dem
 Fix For: 1.3.0


For future compatibility efforts, it would be good for us to automatically add 
metadata to Drill generated Parquet files. At a minimum, we should add 
information about the fact that Drill generated the files and the version of 
Drill that generated the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4104) mvn test standalone does not work due to unpack failure

2015-11-17 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-4104:
-

 Summary: mvn test standalone does not work due to unpack failure
 Key: DRILL-4104
 URL: https://issues.apache.org/jira/browse/DRILL-4104
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Reporter: Jacques Nadeau
Assignee: Jacques Nadeau
 Fix For: 1.4.0


{code}
#mvn clean install -DskipTests
{code}

{code}
# mvn test
{code}

I get the following error when I run the tests:

{code}
[INFO]
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Drill Root POM .. SUCCESS [
> 4.593 s]
> [INFO] ...
> [INFO] exec/Java Execution Engine . FAILURE [
> 2.531 s]
> [INFO] exec/JDBC Driver using dependencies  SKIPPED
> [INFO] ...
> [INFO] contrib/sqlline  SKIPPED
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 36.305 s
> [INFO] Finished at: 2015-11-13T21:05:08+00:00
> [INFO] Final Memory: 118M/1445M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-dependency-plugin:2.8:unpack
> (unpack-vector-types) on project drill-java-exec: Artifact has not been
> packaged yet. When used on reactor artifact, unpack should be executed
> after packaging: see MDEP-98. -> [Help 1]
{code}

I've seen this on both Mac with maven 3.3.3 and CentOS with maven 3.3.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009078#comment-15009078
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45090426
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

There is one challenge with the Values execution in Drill. We use data to 
encode types (and generate the vectors). It seems like the ideal would be 
expressing a values operation that has no data. Maybe we should just support a 
local limit in the values operator? That would allow us to bypass adding the 
limit(0) and sv remover for the simple case. Generally, we should probably 
support leaf node limit pushdown anyway. I see the new patch takes a different 
approach to the one above. One of the things that seemed to be an issue above 
is that the Limit operator was properly terminating its parents in the fast 
schema case of a limit 0. @sudheeshkatkam and @jinfengni, do you agree that is 
an issue? If it is, we should get a JIRA opened for it.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009085#comment-15009085
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45090762
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

One other note on the Calcite rule: it seems like we should just modify the 
Calcite mainline rule to avoid applying the zero records values operator 
optimization in the case that a column is not yet bound to a type (is an ANY 
column). That way we can stop maintaining our version of the rule. @jaltekruse, 
thoughts?


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3854) IOB Exception : CONVERT_FROM (sal, int_be)

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009066#comment-15009066
 ] 

ASF GitHub Bot commented on DRILL-3854:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/262#discussion_r45089585
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/conv/DateEpochBEConvertFrom.java
 ---
@@ -41,6 +41,7 @@ public void eval() {
 
 in.buffer.readerIndex(in.start);
 long epochMillis = Long.reverseBytes(in.buffer.readLong());
+in.buffer.readerIndex(0);
--- End diff --

Can you explain why this is necessary? It seems like the issue you are 
trying to resolve is related to whatever comes after this function rather than 
an issue with this function. There should not guarantees about the reader index 
of a Drillbuf (in fact, we should remove this from being exposed in functions). 
See how this function positions the reader before doing anything else. Is there 
some other operation which doesn't position the reader?


> IOB Exception : CONVERT_FROM (sal, int_be)
> --
>
> Key: DRILL-3854
> URL: https://issues.apache.org/jira/browse/DRILL-3854
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: log, run_time_code.txt
>
>
> CONVERT_FROM function results in IOB Exception
> Drill master commit id : b9afcf8f
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select salary from Emp;
> +-+
> | salary  |
> +-+
> | 8   |
> | 9   |
> | 20  |
> | 95000   |
> | 85000   |
> | 9   |
> | 10  |
> | 87000   |
> | 8   |
> | 10  |
> | 99000   |
> +-+
> 11 rows selected (0.535 seconds)
> # create table using above Emp table
> create table tbl_int_be as select convert_to(salary, 'int_be') sal from Emp;
> 0: jdbc:drill:schema=dfs.tmp> alter session set `planner.slice_target`=1;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.19 seconds)
> # Below query results in IOB on server.
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be') from 
> tbl_int_be order by sal;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: DrillBuf(ridx: 0, widx: 158, 
> cap: 158/158, unwrapped: SlicedByteBuf(ridx: 0, widx: 158, cap: 158/158, 
> unwrapped: UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 
> 0, cap: 417/417.slice(158, 44)
> Fragment 2:0
> [Error Id: 4ee1361d-9877-45eb-bde6-57d5add9fe5e on centos-04.qa.lab:31010] 
> (state=,code=0)
> # Apply convert_from function and project original column results in IOB on 
> client. (because Error Id is missing)
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be'), sal from 
> tbl_int_be;
> Error: Unexpected RuntimeException: java.lang.IndexOutOfBoundsException: 
> DrillBuf(ridx: 0, widx: 114, cap: 114/114, unwrapped: DrillBuf(ridx: 321, 
> widx: 321, cap: 321/321, unwrapped: 
> UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 0, cap: 
> 321/321.slice(55, 103) (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4105) Constant Folding and coalesce function with 3 parameters gives cannot plan exception

2015-11-17 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4105:


 Summary: Constant Folding and coalesce function with 3 parameters 
gives cannot plan exception
 Key: DRILL-4105
 URL: https://issues.apache.org/jira/browse/DRILL-4105
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.3.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=447f8ba

The below query results in a cannot plan exception
{code}
select * from cp.`tpch/lineitem.parquet` d1, cp.`tpch/lineitem.parquet` d2 
where d1.l_comment = coalesce(d2.l_comment, 'asdf', d2.l_comment);
Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due 
to either a cartesian join or an inequality join


[Error Id: d0b75ac0-6d39-4fb0-bc48-13e52fa4abf2 on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

However the query executes fine if I disabled constant folding.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3854) IOB Exception : CONVERT_FROM (sal, int_be)

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009109#comment-15009109
 ] 

ASF GitHub Bot commented on DRILL-3854:
---

Github user hsuanyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/262#discussion_r45092551
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/conv/DateEpochBEConvertFrom.java
 ---
@@ -41,6 +41,7 @@ public void eval() {
 
 in.buffer.readerIndex(in.start);
 long epochMillis = Long.reverseBytes(in.buffer.readLong());
+in.buffer.readerIndex(0);
--- End diff --

Imagine this case: 
A downstream operator asks Project for producing two columns, 
convert_to(col, ...) and col. 

Due to the current implementation of convert_to(), after the the 
convert_to(col, ...) evaluation is done, the readIndex in col's Drillbuf will 
be pointed at the end. 

Then after the receiver uses RecordBatchLoader to read this Drillbuf, IOOB 
will be thrown





> IOB Exception : CONVERT_FROM (sal, int_be)
> --
>
> Key: DRILL-3854
> URL: https://issues.apache.org/jira/browse/DRILL-3854
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: log, run_time_code.txt
>
>
> CONVERT_FROM function results in IOB Exception
> Drill master commit id : b9afcf8f
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select salary from Emp;
> +-+
> | salary  |
> +-+
> | 8   |
> | 9   |
> | 20  |
> | 95000   |
> | 85000   |
> | 9   |
> | 10  |
> | 87000   |
> | 8   |
> | 10  |
> | 99000   |
> +-+
> 11 rows selected (0.535 seconds)
> # create table using above Emp table
> create table tbl_int_be as select convert_to(salary, 'int_be') sal from Emp;
> 0: jdbc:drill:schema=dfs.tmp> alter session set `planner.slice_target`=1;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.19 seconds)
> # Below query results in IOB on server.
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be') from 
> tbl_int_be order by sal;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: DrillBuf(ridx: 0, widx: 158, 
> cap: 158/158, unwrapped: SlicedByteBuf(ridx: 0, widx: 158, cap: 158/158, 
> unwrapped: UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 
> 0, cap: 417/417.slice(158, 44)
> Fragment 2:0
> [Error Id: 4ee1361d-9877-45eb-bde6-57d5add9fe5e on centos-04.qa.lab:31010] 
> (state=,code=0)
> # Apply convert_from function and project original column results in IOB on 
> client. (because Error Id is missing)
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be'), sal from 
> tbl_int_be;
> Error: Unexpected RuntimeException: java.lang.IndexOutOfBoundsException: 
> DrillBuf(ridx: 0, widx: 114, cap: 114/114, unwrapped: DrillBuf(ridx: 321, 
> widx: 321, cap: 321/321, unwrapped: 
> UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 0, cap: 
> 321/321.slice(55, 103) (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4096) Incorrect result when we use coalesce in a join condition along with other filters

2015-11-17 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009128#comment-15009128
 ] 

Victoria Markman commented on DRILL-4096:
-

[~rkins] Is this specific to Hive only ?

> Incorrect result when we use coalesce in a join condition along with other 
> filters
> --
>
> Key: DRILL-4096
> URL: https://issues.apache.org/jira/browse/DRILL-4096
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill, Storage - Hive
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Priority: Critical
>
> git.commit.id.abbrev=447f8ba
> The below query returns no results which is wrong based on the data set. 
> Interestingly if we remove the second filter we get a cannot plan exception 
> from drill. Will raise a different jira, if I cannot find an existing one
> {code}
> select * from hive.null_schemachange d, hive.onlynulls n where d.date_col = 
> coalesce(n.date_col, date '2038-04-10', n.date_col) and d.date_col > 
> '2015-01-01';
> {code}
> Hive DDL :
> {code}
> drop table if exists null_schemachange;
> create external table null_schemachange (
>   int_col int,
>   bigint_col bigint,
>   date_col date,
>   time_col string,
>   timestamp_col timestamp,
>   interval_col string,
>   varchar_col string,
>   float_col float,
>   double_col double,
>   bool_col boolean
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
> LOCATION '/drill/testdata/hive_storage/null_schemachange.tbl'
> TBLPROPERTIES ("serialization.null.format"="null");
> drop table if exists onlynulls;
> create external table onlynulls (
>   int_col int,
>   bigint_col bigint,
>   date_col date,
>   time_col string,
>   timestamp_col timestamp,
>   interval_col string,
>   varchar_col string,
>   float_col float,
>   double_col double,
>   bool_col boolean
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
> LOCATION '/drill/testdata/hive_storage/onlynulls.tbl'
> TBLPROPERTIES ("serialization.null.format"="null");
> {code}
> The data files are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4102) Only one row found in a JSON document that contains multiple items.

2015-11-17 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009136#comment-15009136
 ] 

Sudheesh Katkam commented on DRILL-4102:


Is you JSON file one huge JSON object?

> Only one row found in a JSON document that contains multiple items.
> ---
>
> Key: DRILL-4102
> URL: https://issues.apache.org/jira/browse/DRILL-4102
> Project: Apache Drill
>  Issue Type: Bug
> Environment: OS X, Drill embedded, v1.1.0 installed via HomeBrew
>Reporter: aditya menon
>
> I tried to analyse a JSON file that had the following (sample) structure:
> ```
> {
> "Key1": {
>   "htmltags": " attr3='charlie' />"
> },
> "Key2": {
>   "htmltags": " attr3='mike' />"
> },
> "Key3": {
>   "htmltags": " />"
> }
> }
> ```
> (Apologies for the obfuscation, I am unable to publish the original dataset. 
> But the structure is exactly the same. Note especially how the keys and other 
> data points *differ* in some places, and remain identical in others.)
> When I run a `SELECT * FROM DataFile.json` what I get is a single row listed 
> under three columns: `" />"` [i.e., only the entry `Key1.htmltags`] .
> Ideally, I should see three rows, each with entries from Key1..Key3, listed 
> under the correct respective column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3854) IOB Exception : CONVERT_FROM (sal, int_be)

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009144#comment-15009144
 ] 

ASF GitHub Bot commented on DRILL-3854:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/262#issuecomment-157453984
  
When is RecordBatchLoader reading a DrillBuf that doesn't come on the wire? 
If it is coming off the wire, the issue is in what is send on the wire (rather 
than the decoding side).


> IOB Exception : CONVERT_FROM (sal, int_be)
> --
>
> Key: DRILL-3854
> URL: https://issues.apache.org/jira/browse/DRILL-3854
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: log, run_time_code.txt
>
>
> CONVERT_FROM function results in IOB Exception
> Drill master commit id : b9afcf8f
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select salary from Emp;
> +-+
> | salary  |
> +-+
> | 8   |
> | 9   |
> | 20  |
> | 95000   |
> | 85000   |
> | 9   |
> | 10  |
> | 87000   |
> | 8   |
> | 10  |
> | 99000   |
> +-+
> 11 rows selected (0.535 seconds)
> # create table using above Emp table
> create table tbl_int_be as select convert_to(salary, 'int_be') sal from Emp;
> 0: jdbc:drill:schema=dfs.tmp> alter session set `planner.slice_target`=1;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.19 seconds)
> # Below query results in IOB on server.
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be') from 
> tbl_int_be order by sal;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: DrillBuf(ridx: 0, widx: 158, 
> cap: 158/158, unwrapped: SlicedByteBuf(ridx: 0, widx: 158, cap: 158/158, 
> unwrapped: UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 
> 0, cap: 417/417.slice(158, 44)
> Fragment 2:0
> [Error Id: 4ee1361d-9877-45eb-bde6-57d5add9fe5e on centos-04.qa.lab:31010] 
> (state=,code=0)
> # Apply convert_from function and project original column results in IOB on 
> client. (because Error Id is missing)
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be'), sal from 
> tbl_int_be;
> Error: Unexpected RuntimeException: java.lang.IndexOutOfBoundsException: 
> DrillBuf(ridx: 0, widx: 114, cap: 114/114, unwrapped: DrillBuf(ridx: 321, 
> widx: 321, cap: 321/321, unwrapped: 
> UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 0, cap: 
> 321/321.slice(55, 103) (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4089) Make JSON pretty printing configurable

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009207#comment-15009207
 ] 

ASF GitHub Bot commented on DRILL-4089:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/259


> Make JSON pretty printing configurable
> --
>
> Key: DRILL-4089
> URL: https://issues.apache.org/jira/browse/DRILL-4089
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
>
> Currently JSON record writer emits records pretty-printed and there is no way 
> to configure this behavior. This issue proposes to make this configurable via 
> a prettyPrint switch in -storage- execution configuration with default value 
> of true to ensure backward compatibility.
> As a guideline, the following should be used to dictate Drill to emit records 
> in JSON.
> {code:sql}
> alter [session|sytem] set `store.format`='json';
> {code}
> and this new switch should be used to turn off pretty printing:
> {code:sql}
> alter [session|sytem] set `store.json.writer.uglify`='true';
> {code}
> By default, Drill will use system dependent line feed to seperate JSON blobs 
> when pretty printing turned off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3854) IOB Exception : CONVERT_FROM (sal, int_be)

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009234#comment-15009234
 ] 

ASF GitHub Bot commented on DRILL-3854:
---

Github user hsuanyi commented on the pull request:

https://github.com/apache/drill/pull/262#issuecomment-157468250
  
I think the DrillBuf with readIndex pointing at the end does not get sent 
out. 
(I debugged on both sender and receiver sides. The total size of the 
receiver side is smaller than that on the sender side)


> IOB Exception : CONVERT_FROM (sal, int_be)
> --
>
> Key: DRILL-3854
> URL: https://issues.apache.org/jira/browse/DRILL-3854
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: log, run_time_code.txt
>
>
> CONVERT_FROM function results in IOB Exception
> Drill master commit id : b9afcf8f
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select salary from Emp;
> +-+
> | salary  |
> +-+
> | 8   |
> | 9   |
> | 20  |
> | 95000   |
> | 85000   |
> | 9   |
> | 10  |
> | 87000   |
> | 8   |
> | 10  |
> | 99000   |
> +-+
> 11 rows selected (0.535 seconds)
> # create table using above Emp table
> create table tbl_int_be as select convert_to(salary, 'int_be') sal from Emp;
> 0: jdbc:drill:schema=dfs.tmp> alter session set `planner.slice_target`=1;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.19 seconds)
> # Below query results in IOB on server.
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be') from 
> tbl_int_be order by sal;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: DrillBuf(ridx: 0, widx: 158, 
> cap: 158/158, unwrapped: SlicedByteBuf(ridx: 0, widx: 158, cap: 158/158, 
> unwrapped: UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 
> 0, cap: 417/417.slice(158, 44)
> Fragment 2:0
> [Error Id: 4ee1361d-9877-45eb-bde6-57d5add9fe5e on centos-04.qa.lab:31010] 
> (state=,code=0)
> # Apply convert_from function and project original column results in IOB on 
> client. (because Error Id is missing)
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be'), sal from 
> tbl_int_be;
> Error: Unexpected RuntimeException: java.lang.IndexOutOfBoundsException: 
> DrillBuf(ridx: 0, widx: 114, cap: 114/114, unwrapped: DrillBuf(ridx: 321, 
> widx: 321, cap: 321/321, unwrapped: 
> UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 0, cap: 
> 321/321.slice(55, 103) (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4091) Support more functions in gis contrib module

2015-11-17 Thread Karol Potocki (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karol Potocki updated DRILL-4091:
-
Target Version/s:   (was: 1.3.0)

> Support more functions in gis contrib module
> 
>
> Key: DRILL-4091
> URL: https://issues.apache.org/jira/browse/DRILL-4091
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Karol Potocki
>
> Support for commonly used gis functions in gis contrib module: relate, 
> contains, crosses, intersects, touches, difference, disjoint, buffer, union 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3423) Add New HTTPD format plugin

2015-11-17 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009557#comment-15009557
 ] 

Jacques Nadeau edited comment on DRILL-3423 at 11/17/15 11:03 PM:
--

Here is my alternative proposal: 

With the log format above: 
{code}
"%h %t \"%r\" %>s %b \"%{Referer}i\""
{code}

I propose a user gets the following fields (in order)

remote_host (varchar)
request_receive_time (timestamp)
request_method (varchar)
request_uri (varchar)
response_status (int)
response_bytes (bigint)
header_referer (varchar)

Additionally, I think we should provide two new functions: 

parse_url(varchar url)
parse_url_query(varchar querystring, varchar pairDelimiter, varchar 
keyValueDelimiter)

parse_url(varchar) would provide an output of map type similar to: 
{code}
{
  protocol: ...,
  user: ...,
  password: ...,
  host: ...,
  port: 
  path: 
  query:
  fragment:
}
{code}

parse_url_query(...) would return an array of key values:
{code}
[
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."}
]
{code}
In response to your proposal: I don't think it makes sense to return many 
fields for a date field. Drill already provides functionality to get parts of a 
date. I also don't think it makes sense to prefix a field with its datatype, we 
don't do that anywhere else in Drill. We should also expose parsing an optional 
behavior in Drill.  Note also that my proposal substantially reduces the number 
of fields exposed to the user. I think this proposal has much better usability 
in the context of sql.

If you want to take advantage of the underlying formats capabilities, you can 
treat that as a pushdown of a particular function (data part or the url parsing 
functions above).






was (Author: jnadeau):
Here is my alternative proposal: 

With the log format above: 
{code}
"%h %t \"%r\" %>s %b \"%{Referer}i\""
{code}

I propose a user gets the following fields (in order)

remote_host (varchar)
request_receive_time (drill timestamp)
request_method (varchar)
request_uri (varchar)
response_status (int)
response_bytes (bigint)
header_referer

Additionally, I think we should provide two new functions: 

parse_url(varchar url)
parse_url_query(varchar querystring, varchar pairDelimiter, varchar 
keyValueDelimiter)

parse_url(varchar) would provide an output of map type similar to: 
{code}
{
  protocol: ...,
  user: ...,
  password: ...,
  host: ...,
  port: 
  path: 
  query:
  fragment:
}
{code}

parse_url_query(...) would return an array of key values:
{code}
[
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."}
]
{code}
In response to your proposal: I don't think it makes sense to return many 
fields for a date field. Drill already provides functionality to get parts of a 
date. I also don't think it makes sense to prefix a field with its datatype, we 
don't do that anywhere else in Drill. We should also expose parsing an optional 
behavior in Drill.  Note also that my proposal substantially reduces the number 
of fields exposed to the user. I think this proposal has much better usability 
in the context of sql.

If you want to take advantage of the underlying formats capabilities, you can 
treat that as a pushdown of a particular function (data part or the url parsing 
functions above).





> Add New HTTPD format plugin
> ---
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Jacques Nadeau
>Assignee: Jim Scott
> Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough 
> to move the logparser project to be released under the Apache License.  Can 
> find it here:
> 
> nl.basjes.parse.httpdlog
> httpdlog-parser
> 2.0
> 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4102) Only one row found in a JSON document that contains multiple items.

2015-11-17 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009893#comment-15009893
 ] 

Sudheesh Katkam commented on DRILL-4102:


In any case, take a look at the [KVGEN|https://drill.apache.org/docs/kvgen/] 
function.

> Only one row found in a JSON document that contains multiple items.
> ---
>
> Key: DRILL-4102
> URL: https://issues.apache.org/jira/browse/DRILL-4102
> Project: Apache Drill
>  Issue Type: Bug
> Environment: OS X, Drill embedded, v1.1.0 installed via HomeBrew
>Reporter: aditya menon
>
> I tried to analyse a JSON file that had the following (sample) structure:
> ```
> {
> "Key1": {
>   "htmltags": " attr3='charlie' />"
> },
> "Key2": {
>   "htmltags": " attr3='mike' />"
> },
> "Key3": {
>   "htmltags": " />"
> }
> }
> ```
> (Apologies for the obfuscation, I am unable to publish the original dataset. 
> But the structure is exactly the same. Note especially how the keys and other 
> data points *differ* in some places, and remain identical in others.)
> When I run a `SELECT * FROM DataFile.json` what I get is a single row listed 
> under three columns: `" />"` [i.e., only the entry `Key1.htmltags`] .
> Ideally, I should see three rows, each with entries from Key1..Key3, listed 
> under the correct respective column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3854) IOB Exception : CONVERT_FROM (sal, int_be)

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009318#comment-15009318
 ] 

ASF GitHub Bot commented on DRILL-3854:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/262#issuecomment-157477456
  
It sounds like we need to apply the fix where that is failing. This is the 
type of code where this currently happens: 
https://github.com/apache/drill/blob/master/exec/vector/src/main/java/org/apache/drill/exec/vector/BaseDataValueVector.java#L63




> IOB Exception : CONVERT_FROM (sal, int_be)
> --
>
> Key: DRILL-3854
> URL: https://issues.apache.org/jira/browse/DRILL-3854
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: log, run_time_code.txt
>
>
> CONVERT_FROM function results in IOB Exception
> Drill master commit id : b9afcf8f
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select salary from Emp;
> +-+
> | salary  |
> +-+
> | 8   |
> | 9   |
> | 20  |
> | 95000   |
> | 85000   |
> | 9   |
> | 10  |
> | 87000   |
> | 8   |
> | 10  |
> | 99000   |
> +-+
> 11 rows selected (0.535 seconds)
> # create table using above Emp table
> create table tbl_int_be as select convert_to(salary, 'int_be') sal from Emp;
> 0: jdbc:drill:schema=dfs.tmp> alter session set `planner.slice_target`=1;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.19 seconds)
> # Below query results in IOB on server.
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be') from 
> tbl_int_be order by sal;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: DrillBuf(ridx: 0, widx: 158, 
> cap: 158/158, unwrapped: SlicedByteBuf(ridx: 0, widx: 158, cap: 158/158, 
> unwrapped: UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 
> 0, cap: 417/417.slice(158, 44)
> Fragment 2:0
> [Error Id: 4ee1361d-9877-45eb-bde6-57d5add9fe5e on centos-04.qa.lab:31010] 
> (state=,code=0)
> # Apply convert_from function and project original column results in IOB on 
> client. (because Error Id is missing)
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be'), sal from 
> tbl_int_be;
> Error: Unexpected RuntimeException: java.lang.IndexOutOfBoundsException: 
> DrillBuf(ridx: 0, widx: 114, cap: 114/114, unwrapped: DrillBuf(ridx: 321, 
> widx: 321, cap: 321/321, unwrapped: 
> UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 0, cap: 
> 321/321.slice(55, 103) (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3171) Storage Plugins : Two processes tried to update the storage plugin at the same time

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009345#comment-15009345
 ] 

ASF GitHub Bot commented on DRILL-3171:
---

Github user hnfgns commented on the pull request:

https://github.com/apache/drill/pull/260#issuecomment-157481444
  
I will look into StoragePluginMap as well.


> Storage Plugins : Two processes tried to update the storage plugin at the 
> same time
> ---
>
> Key: DRILL-3171
> URL: https://issues.apache.org/jira/browse/DRILL-3171
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Affects Versions: 1.0.0
>Reporter: Rahul Challapalli
>Assignee: Deneche A. Hakim
>  Labels: test
> Fix For: Future
>
>
> Commit Id# : bd8ac4fca03ad5043bca27fbc7e0dec5a35ac474
> We have seen this issue happen with the below steps
>1. Clear out the zookeeper
>2. Update the storage plugin using the rest API on one of the node
>3. Submit 10 queries concurrently
> With randomized foreman node selection, the node executing the query might 
> not have the updated storage plugins info. This could be causing the issue.
> - Rahul



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3171) Storage Plugins : Two processes tried to update the storage plugin at the same time

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009341#comment-15009341
 ] 

ASF GitHub Bot commented on DRILL-3171:
---

Github user hnfgns commented on a diff in the pull request:

https://github.com/apache/drill/pull/260#discussion_r45108203
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/sys/zk/ZkAbstractStore.java
 ---
@@ -136,7 +138,81 @@ public boolean putIfAbsent(String key, V value) {
 }
   }
 
-  public abstract void createNodeInZK (String key, V value);
+  /**
+   * Default {@link CreateMode create mode} that will be used in create 
operations referred in the see also section.
+   *
+   * @see #createOrUpdate(String, Object)
+   * @see #createWithPrefix(String, Object)
+   */
+  protected abstract CreateMode getCreateMode();
+
+
+  /**
+   * Creates a node in zookeeper with the {@link #getCreateMode() default 
create mode} and sets its value if supplied.
+   *
+   * @param pathtarget path
+   * @param value   value to set, null if none available
+   *
+   * @see #getCreateMode()
+   * @see #createOrUpdate(String, Object)
+   * @see #withPrefix(String)
+   */
+  protected void createWithPrefix(String path, V value) {
+createOrUpdate(withPrefix(path), value);
+  }
+
+  /**
+   * Creates a node in zookeeper with the {@link #getCreateMode() default 
create mode} and sets its value if supplied
+   * or updates its value if the node already exists.
+   *
+   * Note that if node exists, its mode will not be changed.
+   *
+   * @param pathtarget path
+   * @param value   value to set, null if none available
+   *
+   * @see #getCreateMode()
+   * @see #createOrUpdate(String, Object, CreateMode)
+   */
+  protected void createOrUpdate(String path, V value) {
+createOrUpdate(path, value, getCreateMode());
+  }
+
+  /**
+   * Creates a node in zookeeper with the given mode and sets its value if 
supplied or updates its value if the node
+   * already exists.
+   *
+   * Note that if the node exists, its mode will not be changed.
+   *
+   * Internally, the method suppresses {@link 
org.apache.zookeeper.KeeperException.NodeExistsException}. It is
+   * safe to do so since the implementation is idempotent.
+   *
+   * @param pathtarget path
+   * @param value   value to set, null if none available
+   * @param modecreation mode
+   * @throws RuntimeException  throws a {@link RuntimeException} wrapping 
the root cause.
+   */
+  protected void createOrUpdate(String path, V value, CreateMode mode) {
+try {
+  final boolean isUpdate = value != null;
+  final byte[] valueInBytes = isUpdate ? 
config.getSerializer().serialize(value) : null;
+  final boolean nodeExists = framework.checkExists().forPath(path) != 
null;
+  if (!nodeExists) {
+final ACLBackgroundPathAndBytesable creator = 
framework.create().withMode(mode);
+if (isUpdate) {
+  creator.forPath(path, valueInBytes);
+} else {
+  creator.forPath(path);
+}
+  } else if (isUpdate) {
+framework.setData().forPath(path, valueInBytes);
+  }
+} catch (KeeperException.NodeExistsException ex) {
+  logger.warn("Node already exists in Zookeeper. Skipping... -- [path: 
{}, mode: {}]", path, mode);
--- End diff --

We already do handle two cases? i) nodeExists & update ii) !nodeExists & 
update.


> Storage Plugins : Two processes tried to update the storage plugin at the 
> same time
> ---
>
> Key: DRILL-3171
> URL: https://issues.apache.org/jira/browse/DRILL-3171
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Affects Versions: 1.0.0
>Reporter: Rahul Challapalli
>Assignee: Deneche A. Hakim
>  Labels: test
> Fix For: Future
>
>
> Commit Id# : bd8ac4fca03ad5043bca27fbc7e0dec5a35ac474
> We have seen this issue happen with the below steps
>1. Clear out the zookeeper
>2. Update the storage plugin using the rest API on one of the node
>3. Submit 10 queries concurrently
> With randomized foreman node selection, the node executing the query might 
> not have the updated storage plugins info. This could be causing the issue.
> - Rahul



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3171) Storage Plugins : Two processes tried to update the storage plugin at the same time

2015-11-17 Thread Hanifi Gunes (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes reassigned DRILL-3171:
---

Assignee: Hanifi Gunes  (was: Deneche A. Hakim)

> Storage Plugins : Two processes tried to update the storage plugin at the 
> same time
> ---
>
> Key: DRILL-3171
> URL: https://issues.apache.org/jira/browse/DRILL-3171
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Affects Versions: 1.0.0
>Reporter: Rahul Challapalli
>Assignee: Hanifi Gunes
>  Labels: test
> Fix For: Future
>
>
> Commit Id# : bd8ac4fca03ad5043bca27fbc7e0dec5a35ac474
> We have seen this issue happen with the below steps
>1. Clear out the zookeeper
>2. Update the storage plugin using the rest API on one of the node
>3. Submit 10 queries concurrently
> With randomized foreman node selection, the node executing the query might 
> not have the updated storage plugins info. This could be causing the issue.
> - Rahul



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4070) Metadata Caching : min/max values are null for varchar columns in auto partitioned data

2015-11-17 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4070:

Assignee: Parth Chandra

> Metadata Caching : min/max values are null for varchar columns in auto 
> partitioned data
> ---
>
> Key: DRILL-4070
> URL: https://issues.apache.org/jira/browse/DRILL-4070
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Parth Chandra
>Priority: Blocker
> Fix For: 1.3.0
>
> Attachments: cache.txt, fewtypes_varcharpartition.tar.tgz
>
>
> git.commit.id.abbrev=e78e286
> The metadata cache file created contains incorrect values for min/max fields 
> for varchar colums. The data is also partitioned on the varchar column
> {code}
> refresh table metadata fewtypes_varcharpartition;
> {code}
> As a result partition pruning is not happening. This was working after 
> DRILL-3937 has been fixed (d331330efd27dbb8922024c4a18c11e76a00016b)
> I attached the data set and the cache file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4106) Redundant Project on top of Scan in query plan

2015-11-17 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4106:
-

 Summary: Redundant Project on top of Scan in query plan
 Key: DRILL-4106
 URL: https://issues.apache.org/jira/browse/DRILL-4106
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.3.0
Reporter: Khurram Faraaz
Priority: Minor


Why doe we see two Projects after the Scan in the query plan ? 
Table is auto partitioned by column c1
4 node cluster on CentOS, Drill 1.3, git.commit.id=a639c51c

#CTAS statement is,

{code}
CREATE TABLE inNstedDirAutoPrtn PARTITION BY(c1) AS SELECT cast(columns[0] AS 
INT) c1, cast(columns[1] AS BIGINT) c2, cast(columns[2] AS CHAR(2)) c3, 
cast(columns[3] AS VARCHAR(54)) c4, cast(columns[4] AS TIMESTAMP) c5, 
cast(columns[5] AS DATE) c6, cast(columns[6] as BOOLEAN) c7, cast(columns[7] as 
DOUBLE) c8, cast(columns[8] as TIME) c9 FROM `nested_dirs/data/csv/allData.csv`;

Why do we see two Projects on top of Scan in query plan ? One of them looks 
redundant.

0: jdbc:drill:schema=dfs.tmp> explain plan for select * from inNstedDirAutoPrtn 
where c1 IN (1,2,3,4,-1,0,100,-1710);
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Project(*=[$0])
00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/tmp/inNstedDirAutoPrtn/0_0_48.parquet], ReadEntryWithPath 
[path=/tmp/inNstedDirAutoPrtn/0_0_31.parquet], ReadEntryWithPath 
[path=/tmp/inNstedDirAutoPrtn/0_0_50.parquet], ReadEntryWithPath 
[path=/tmp/inNstedDirAutoPrtn/0_0_47.parquet], ReadEntryWithPath 
[path=/tmp/inNstedDirAutoPrtn/0_0_49.parquet], ReadEntryWithPath 
[path=/tmp/inNstedDirAutoPrtn/0_0_46.parquet]], 
selectionRoot=maprfs:/tmp/inNstedDirAutoPrtn, numFiles=6, 
usedMetadataFile=false, columns=[`*`]]])

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3854) IOB Exception : CONVERT_FROM (sal, int_be)

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009434#comment-15009434
 ] 

ASF GitHub Bot commented on DRILL-3854:
---

Github user hsuanyi commented on the pull request:

https://github.com/apache/drill/pull/262#issuecomment-157496916
  
jacques-n, 
Would you mind taking another look? This new fix resides in the 
writablebatch, which makes it independent of any function.


> IOB Exception : CONVERT_FROM (sal, int_be)
> --
>
> Key: DRILL-3854
> URL: https://issues.apache.org/jira/browse/DRILL-3854
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: log, run_time_code.txt
>
>
> CONVERT_FROM function results in IOB Exception
> Drill master commit id : b9afcf8f
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select salary from Emp;
> +-+
> | salary  |
> +-+
> | 8   |
> | 9   |
> | 20  |
> | 95000   |
> | 85000   |
> | 9   |
> | 10  |
> | 87000   |
> | 8   |
> | 10  |
> | 99000   |
> +-+
> 11 rows selected (0.535 seconds)
> # create table using above Emp table
> create table tbl_int_be as select convert_to(salary, 'int_be') sal from Emp;
> 0: jdbc:drill:schema=dfs.tmp> alter session set `planner.slice_target`=1;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.19 seconds)
> # Below query results in IOB on server.
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be') from 
> tbl_int_be order by sal;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: DrillBuf(ridx: 0, widx: 158, 
> cap: 158/158, unwrapped: SlicedByteBuf(ridx: 0, widx: 158, cap: 158/158, 
> unwrapped: UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 
> 0, cap: 417/417.slice(158, 44)
> Fragment 2:0
> [Error Id: 4ee1361d-9877-45eb-bde6-57d5add9fe5e on centos-04.qa.lab:31010] 
> (state=,code=0)
> # Apply convert_from function and project original column results in IOB on 
> client. (because Error Id is missing)
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be'), sal from 
> tbl_int_be;
> Error: Unexpected RuntimeException: java.lang.IndexOutOfBoundsException: 
> DrillBuf(ridx: 0, widx: 114, cap: 114/114, unwrapped: DrillBuf(ridx: 321, 
> widx: 321, cap: 321/321, unwrapped: 
> UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 0, cap: 
> 321/321.slice(55, 103) (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3423) Add New HTTPD format plugin

2015-11-17 Thread Jim Scott (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009469#comment-15009469
 ] 

Jim Scott commented on DRILL-3423:
--

I have made some modifications to change the :map to now end with _$ for maps 
of data.

When the parser has fields like:
TIME.DAY:request.receive.time.day_utc
They will now be identified as:
TIME_DAY:request_receive_time_day__utc

The type remapping capability is to prefix the field name with a # like:
#HTTP_URI:request_firstline_uri_query_myvariable

Additionally, due to these changes, I have removed the fields mapping 
completely from the bootstrap and the user configuration which should make this 
easier for the user.

I believe the documentation for this plugin will be very straightforward and 
yield a solid user experience. 

> Add New HTTPD format plugin
> ---
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Jacques Nadeau
>Assignee: Jim Scott
> Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough 
> to move the logparser project to be released under the Apache License.  Can 
> find it here:
> 
> nl.basjes.parse.httpdlog
> httpdlog-parser
> 2.0
> 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3854) IOB Exception : CONVERT_FROM (sal, int_be)

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009503#comment-15009503
 ] 

ASF GitHub Bot commented on DRILL-3854:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/262#discussion_r45120761
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/WritableBatch.java ---
@@ -149,6 +149,7 @@ public static WritableBatch getBatchNoHV(int 
recordCount, Iterable
   }
 
   for (DrillBuf b : vv.getBuffers(true)) {
+b.readerIndex(0);
--- End diff --

I believe the contract of getBuffers() is that buffers are returned in a 
reader appropriate state. As such, you should figure out which buffers are 
failing to guarantee this. It should be easy as there are only a small amount 
of implementations of this. In other words, where are we failing to ensure this?

Given the code I looked at before, I think the problem may be that the 
readerIndex behavior is only inside the clear statement. @StevenMPhillips , it 
seems like this line: 
https://github.com/apache/drill/blame/master/exec/vector/src/main/java/org/apache/drill/exec/vector/BaseDataValueVector.java#L63
 should be outside the if(clear). Thoughts?


> IOB Exception : CONVERT_FROM (sal, int_be)
> --
>
> Key: DRILL-3854
> URL: https://issues.apache.org/jira/browse/DRILL-3854
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: log, run_time_code.txt
>
>
> CONVERT_FROM function results in IOB Exception
> Drill master commit id : b9afcf8f
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select salary from Emp;
> +-+
> | salary  |
> +-+
> | 8   |
> | 9   |
> | 20  |
> | 95000   |
> | 85000   |
> | 9   |
> | 10  |
> | 87000   |
> | 8   |
> | 10  |
> | 99000   |
> +-+
> 11 rows selected (0.535 seconds)
> # create table using above Emp table
> create table tbl_int_be as select convert_to(salary, 'int_be') sal from Emp;
> 0: jdbc:drill:schema=dfs.tmp> alter session set `planner.slice_target`=1;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.19 seconds)
> # Below query results in IOB on server.
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be') from 
> tbl_int_be order by sal;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: DrillBuf(ridx: 0, widx: 158, 
> cap: 158/158, unwrapped: SlicedByteBuf(ridx: 0, widx: 158, cap: 158/158, 
> unwrapped: UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 
> 0, cap: 417/417.slice(158, 44)
> Fragment 2:0
> [Error Id: 4ee1361d-9877-45eb-bde6-57d5add9fe5e on centos-04.qa.lab:31010] 
> (state=,code=0)
> # Apply convert_from function and project original column results in IOB on 
> client. (because Error Id is missing)
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be'), sal from 
> tbl_int_be;
> Error: Unexpected RuntimeException: java.lang.IndexOutOfBoundsException: 
> DrillBuf(ridx: 0, widx: 114, cap: 114/114, unwrapped: DrillBuf(ridx: 321, 
> widx: 321, cap: 321/321, unwrapped: 
> UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 0, cap: 
> 321/321.slice(55, 103) (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010015#comment-15010015
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45150593
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

My note was in reference to DrillReduceExpressionsRule.

Basically, you could be able to modify this classes implementations of 
createEmptyRelOrEquivalent() to switch to a values (with fake data) operator 
followed by a limit(0). At least that was the thought.




> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4103) Add additional metadata to Parquet files generated by Drill

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010053#comment-15010053
 ] 

ASF GitHub Bot commented on DRILL-4103:
---

Github user jaltekruse commented on the pull request:

https://github.com/apache/drill/pull/264#issuecomment-157576301
  
+1


> Add additional metadata to Parquet files generated by Drill
> ---
>
> Key: DRILL-4103
> URL: https://issues.apache.org/jira/browse/DRILL-4103
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Jacques Nadeau
>Assignee: Julien Le Dem
> Fix For: 1.3.0
>
>
> For future compatibility efforts, it would be good for us to automatically 
> add metadata to Drill generated Parquet files. At a minimum, we should add 
> information about the fact that Drill generated the files and the version of 
> Drill that generated the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4109) NPE in external sort

2015-11-17 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010084#comment-15010084
 ] 

Khurram Faraaz commented on DRILL-4109:
---

Stack trace is same as that reported in DRILL-4035 (look in the comments to 
that JIRA).

> NPE in external sort 
> -
>
> Key: DRILL-4109
> URL: https://issues.apache.org/jira/browse/DRILL-4109
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Victoria Markman
>Priority: Blocker
> Attachments: 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill, 
> drillbit.log
>
>
> 4 node cluster
> 36GB of direct memory
> 4GB heap memory
> planner.memory.max_query_memory_per_node=2GB (default)
> planner.enable_hashjoin = false
> Spill directory has 6.4T of memory available:
> {noformat}
> [Tue Nov 17 18:23:18 /tmp/drill ] # df -H .
> Filesystem   Size  Used Avail Use% Mounted on
> localhost:/mapr  7.7T  1.4T  6.4T  18% /mapr
> {noformat}
> Run query below: 
> framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> drillbit.log
> {code}
> 2015-11-18 02:22:12,639 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:9] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_9/operator_17/7
> 2015-11-18 02:22:12,770 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:5] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_5/operator_17/7
> 2015-11-18 02:22:13,345 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:17] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_17/operator_17/7
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_13/operator_16/1
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] WARN 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 34 batch groups. 
> Current allocated memory: 2252186
> 2015-11-18 02:22:13,363 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested RUNNING --> 
> FAILED
> 2015-11-18 02:22:13,370 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested FAILED --> 
> FINISHED
> 2015-11-18 02:22:13,371 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> java.lang.NullPointerException: null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4110) Avro tests are not verifying their results

2015-11-17 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4110:
--

 Summary: Avro tests are not verifying their results
 Key: DRILL-4110
 URL: https://issues.apache.org/jira/browse/DRILL-4110
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Jason Altekruse
Priority: Critical


A lot of tests have been written for the Avro format plugin that generate a 
variety of different files with various schema properties. These tests are 
currently just verifying that the files can be read without throwing any 
exceptions, but the results coming out of drill are not being verified. Some of 
these tests were fixed as a part of DRILL-4056, the rest still need to be 
refactored to add baseline verification checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4082) Better error message when multiple versions of the same function are found by the classpath scanner

2015-11-17 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved DRILL-4082.
--
Resolution: Fixed

> Better error message when multiple versions of the same function are found by 
> the classpath scanner
> ---
>
> Key: DRILL-4082
> URL: https://issues.apache.org/jira/browse/DRILL-4082
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> PR:
> https://github.com/apache/drill/pull/252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4015) Update DrillClient and JDBC driver to expose warnings provided via RPC layer

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009938#comment-15009938
 ] 

ASF GitHub Bot commented on DRILL-4015:
---

GitHub user abhipol opened a pull request:

https://github.com/apache/drill/pull/263

DRILL-4015: Update DrillClient and JDBC driver to expose warnings provided 
via RPC layer

surface our query warnings from operators to JDBC client as a part of query 
result as well as out of band warnings. Details: 
https://docs.google.com/document/d/1HwpD3gRbNohpse9zbm3cmLZixGn6EgapulEGrRtKlkA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/abhipol/drill issues/DRILL-4015

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #263


commit 137059cd44ec28e8ba3bf2aa73d2c1cbcd55d604
Author: Abhi P 
Date:   2015-11-17T22:54:56Z

Support for drill warnings in JDBC




> Update DrillClient and JDBC driver to expose warnings provided via RPC layer
> 
>
> Key: DRILL-4015
> URL: https://issues.apache.org/jira/browse/DRILL-4015
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Client - JDBC, Execution - RPC
>Reporter: Jacques Nadeau
>Assignee: Abhijit Pol
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4109) NPE in external sort

2015-11-17 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4109:

Attachment: 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill

> NPE in external sort 
> -
>
> Key: DRILL-4109
> URL: https://issues.apache.org/jira/browse/DRILL-4109
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Victoria Markman
>Priority: Blocker
> Attachments: 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill
>
>
> 4 node cluster
> 36GB of direct memory
> 4GB heap memory
> planner.memory.max_query_memory_per_node=2GB (default)
> planner.enable_hashjoin = false
> Spill directory has 6.4T of memory available:
> {noformat}
> [Tue Nov 17 18:23:18 /tmp/drill ] # df -H .
> Filesystem   Size  Used Avail Use% Mounted on
> localhost:/mapr  7.7T  1.4T  6.4T  18% /mapr
> {noformat}
> Run query below: 
> framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> drillbit.log
> {code}
> 2015-11-18 02:22:12,639 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:9] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_9/operator_17/7
> 2015-11-18 02:22:12,770 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:5] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_5/operator_17/7
> 2015-11-18 02:22:13,345 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:17] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_17/operator_17/7
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_13/operator_16/1
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] WARN 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 34 batch groups. 
> Current allocated memory: 2252186
> 2015-11-18 02:22:13,363 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested RUNNING --> 
> FAILED
> 2015-11-18 02:22:13,370 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested FAILED --> 
> FINISHED
> 2015-11-18 02:22:13,371 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> java.lang.NullPointerException: null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4109) NPE in external sort

2015-11-17 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-4109:
---

 Summary: NPE in external sort 
 Key: DRILL-4109
 URL: https://issues.apache.org/jira/browse/DRILL-4109
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.4.0
Reporter: Victoria Markman
Priority: Blocker
 Attachments: 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill

4 node cluster
36GB of direct memory
4GB heap memory

planner.memory.max_query_memory_per_node=2GB (default)
planner.enable_hashjoin = false

Spill directory has 6.4T of memory available:
{noformat}
[Tue Nov 17 18:23:18 /tmp/drill ] # df -H .
Filesystem   Size  Used Avail Use% Mounted on
localhost:/mapr  7.7T  1.4T  6.4T  18% /mapr
{noformat}

Run query below: 
framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql

drillbit.log
{code}
2015-11-18 02:22:12,639 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:9] INFO  
o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
/tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_9/operator_17/7
2015-11-18 02:22:12,770 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:5] INFO  
o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
/tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_5/operator_17/7
2015-11-18 02:22:13,345 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:17] INFO  
o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
/tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_17/operator_17/7
2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO  
o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
/tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_13/operator_16/1
2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] WARN  
o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 34 batch groups. 
Current allocated memory: 2252186
2015-11-18 02:22:13,363 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 
29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested RUNNING --> 
FAILED
2015-11-18 02:22:13,370 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 
29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested FAILED --> 
FINISHED
2015-11-18 02:22:13,371 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException

Fragment 3:13

[Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
NullPointerException

Fragment 3:13

[Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
 ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
 [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
 [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
 [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_71]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
java.lang.NullPointerException: null
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4108) Query on csv file w/ header fails with an exception when non existing column is requested

2015-11-17 Thread Abhi Pol (JIRA)
Abhi Pol created DRILL-4108:
---

 Summary: Query on csv file w/ header fails with an exception when 
non existing column is requested
 Key: DRILL-4108
 URL: https://issues.apache.org/jira/browse/DRILL-4108
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Text & CSV
Affects Versions: 1.3.0
Reporter: Abhi Pol
 Fix For: 1.4.0


Drill query on a csv file with header requesting column(s) that do not exists 
in header fails with an exception.

*Current behavior:* once extractHeader is enabled, query columns must be 
columns from the header

*Expected behavior:* non existing columns should appear with 'null' values like 
default drill behavior

{noformat}
0: jdbc:drill:zk=local> select Category from dfs.`/tmp/cars.csvh` limit 10;
java.lang.ArrayIndexOutOfBoundsException: -1
at 
org.apache.drill.exec.store.easy.text.compliant.FieldVarCharOutput.(FieldVarCharOutput.java:104)
at 
org.apache.drill.exec.store.easy.text.compliant.CompliantTextRecordReader.setup(CompliantTextRecordReader.java:118)
at 
org.apache.drill.exec.physical.impl.ScanBatch.(ScanBatch.java:108)
at 
org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:198)
at 
org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:35)
at 
org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:151)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:105)
at 
org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:79)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:230)
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: -1

Fragment 0:0

[Error Id: f272960e-fa2f-408e-918c-722190398cd3 on blackhole:31010] 
(state=,code=0)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4109) NPE in external sort

2015-11-17 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4109:

Attachment: drillbit.log

> NPE in external sort 
> -
>
> Key: DRILL-4109
> URL: https://issues.apache.org/jira/browse/DRILL-4109
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Victoria Markman
>Priority: Blocker
> Attachments: 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill, 
> drillbit.log
>
>
> 4 node cluster
> 36GB of direct memory
> 4GB heap memory
> planner.memory.max_query_memory_per_node=2GB (default)
> planner.enable_hashjoin = false
> Spill directory has 6.4T of memory available:
> {noformat}
> [Tue Nov 17 18:23:18 /tmp/drill ] # df -H .
> Filesystem   Size  Used Avail Use% Mounted on
> localhost:/mapr  7.7T  1.4T  6.4T  18% /mapr
> {noformat}
> Run query below: 
> framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> drillbit.log
> {code}
> 2015-11-18 02:22:12,639 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:9] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_9/operator_17/7
> 2015-11-18 02:22:12,770 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:5] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_5/operator_17/7
> 2015-11-18 02:22:13,345 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:17] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_17/operator_17/7
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_13/operator_16/1
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] WARN 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 34 batch groups. 
> Current allocated memory: 2252186
> 2015-11-18 02:22:13,363 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested RUNNING --> 
> FAILED
> 2015-11-18 02:22:13,370 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested FAILED --> 
> FINISHED
> 2015-11-18 02:22:13,371 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> java.lang.NullPointerException: null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4056) Avro deserialization corrupts data

2015-11-17 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010085#comment-15010085
 ] 

Jason Altekruse commented on DRILL-4056:


Hey guys,

I got a chance to get back to this today. It looks like all of the tests we had 
written for the avro reader were not actually verifying their results. I have 
started refactoring the tests on the branch with this fix, but there are a good 
number of them and I would like to just get a PR up for this fix so that we can 
get it into the next 1.3 release candidate.

I have opened a new JIRA for refactoring the remainder of the existing Avro 
tests. DRILL-4110

Updated PR based on the master branch will be posted shortly.



> Avro deserialization corrupts data
> --
>
> Key: DRILL-4056
> URL: https://issues.apache.org/jira/browse/DRILL-4056
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0
> Environment: Ubuntu 15.04 - Oracle Java
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
> Fix For: 1.3.0
>
> Attachments: test.zip
>
>
> I have an Avro file that support the following data/schema:
> {"field":"some", "classification":{"variant":"Gæst"}}
> When I select 10 rows from this file I get:
> +-+
> |   EXPR$0|
> +-+
> | Gæst|
> | Voksen  |
> | Voksen  |
> | Invitation KIF KBH  |
> | Invitation KIF KBH  |
> | Ordinarie pris KBH  |
> | Ordinarie pris KBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> +-+
> The bug is that the field values are incorrectly de-serialized and the value 
> from the previous row is retained if the subsequent row is shorter.
> The sql query:
> "select s.classification.variant variant from dfs. as s limit 10;"
> That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the 
> previous row had the value "Invitation KIF KBH".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4102) Only one row found in a JSON document that contains multiple items.

2015-11-17 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010276#comment-15010276
 ] 

Sudheesh Katkam commented on DRILL-4102:


Drill supports json files of a [certain 
format|https://drill.apache.org/docs/json-data-model/#reading-json].
A simple change to the file allows for queries that you might be interested in:
{code}
{ 
  "keys" : 
  {
"Key1":
  { 
"htmltags": "" 
  },
"Key2":
  { 
"htmltags": ""
  },
"Key3":
  {
"htmltags": ""
  }
  }
}
{code}
Queries:
{code}
> select kvgen(keys) from dfs.`/root/data.json`;
++
| EXPR$0 |
++
| [{"key":"Key1","value":{"htmltags":""}},{"key":"Key2","value":{"htmltags":""}},{"key":"Key3","value":{"htmltags":""}}] |
++

> select flatten(kvgen(keys)) from dfs.`/root/data.json`;
+--+
|  EXPR$0   
   |
+--+
| {"key":"Key1","value":{"htmltags":""}}  |
| {"key":"Key2","value":{"htmltags":""}}   |
| {"key":"Key3","value":{"htmltags":""}}  |
+--+

> select t.r.key, t.r.`value` from (select flatten(kvgen(keys)) as r from 
> dfs.`/root/data.json`) t;
+-+---+
| EXPR$0  |  EXPR$1 
  |
+-+---+
| Key1| {"htmltags":""}  |
| Key2| {"htmltags":""}   |
| Key3| {"htmltags":""}  |
+-+---+
{code}

> Only one row found in a JSON document that contains multiple items.
> ---
>
> Key: DRILL-4102
> URL: https://issues.apache.org/jira/browse/DRILL-4102
> Project: Apache Drill
>  Issue Type: Bug
> Environment: OS X, Drill embedded, v1.1.0 installed via HomeBrew
>Reporter: aditya menon
>
> I tried to analyse a JSON file that had the following (sample) structure:
> {code:json}
> {
> "Key1": {
>   "htmltags": " attr3='charlie' />"
> },
> "Key2": {
>   "htmltags": " attr3='mike' />"
> },
> "Key3": {
>   "htmltags": " />"
> }
> }
> {code}
> (Apologies for the obfuscation, I am unable to publish the original dataset. 
> But the structure is exactly the same. Note especially how the keys and other 
> data points *differ* in some places, and remain identical in others.)
> When I run a {code:sql}SELECT * FROM DataFile.json{code} what I get is a 
> single row listed under three columns: {code:html}" />"{code} [i.e., only the 
> entry `Key1.htmltags`] .
> Ideally, I should see three rows, each with entries from Key1..Key3, listed 
> under the correct respective column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010297#comment-15010297
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45162213
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

The `getValuesRelIfFullySchemaed(...)` check is done before logical 
transformation to avoid creating expensive objects while applying rules (unless 
rules are ordered). For example, DrillScanRule creates a DrillScanRel object 
which constructs a group scan object that can be quite expensive (see HiveScan, 
MongoGroupScan, HbaseGroupScan).


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4111) Remove travis ci config file

2015-11-17 Thread Julien Le Dem (JIRA)
Julien Le Dem created DRILL-4111:


 Summary: Remove travis ci config file
 Key: DRILL-4111
 URL: https://issues.apache.org/jira/browse/DRILL-4111
 Project: Apache Drill
  Issue Type: Task
Reporter: Julien Le Dem
Assignee: Julien Le Dem


Since the travis build always fails, we should just turn it off for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4111) Remove travis ci config file

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010301#comment-15010301
 ] 

ASF GitHub Bot commented on DRILL-4111:
---

GitHub user julienledem opened a pull request:

https://github.com/apache/drill/pull/267

DRILL-4111: remove travis-ci file, as the drill build does not work t…

…here

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/julienledem/drill remove_travis

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #267


commit 9f38414a8e8b1bb7a281fb0bf4a3400cc7c7c9ab
Author: Julien Le Dem 
Date:   2015-11-18T05:53:45Z

DRILL-4111: remove travis-ci file, as the drill build does not work there




> Remove travis ci config file
> 
>
> Key: DRILL-4111
> URL: https://issues.apache.org/jira/browse/DRILL-4111
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>
> Since the travis build always fails, we should just turn it off for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4063) Missing files/classes needed for S3a access

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010113#comment-15010113
 ] 

ASF GitHub Bot commented on DRILL-4063:
---

GitHub user abhipol opened a pull request:

https://github.com/apache/drill/pull/265

DRILL-4063: Missing files/classes needed for S3a access

Pulling s3a support dep jars from hadoop-aws

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/abhipol/drill s3dep

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/265.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #265


commit 3007638758104a1e79a92debc683cd341024d1b3
Author: Abhi P 
Date:   2015-11-18T03:04:23Z

pulling s3a support dep jars




> Missing files/classes needed for S3a access
> ---
>
> Key: DRILL-4063
> URL: https://issues.apache.org/jira/browse/DRILL-4063
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0
> Environment: All
>Reporter: Nathan Griffith
>  Labels: aws, aws-s3, s3, storage
>
> Specifying
> {code}
> "connection": "s3a://"
> {code}
> results in the following error:
> {code}
> Error: SYSTEM ERROR: ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> {code}
> I can fix this by dropping in these files from the hadoop binary tarball:
> hadoop-aws-2.6.2.jar
> aws-java-sdk-1.7.4.jar
> And then adding this to my core-site.xml:
> {code:xml}
>   
> fs.s3a.access.key
> ACCESSKEY
>   
>   
> fs.s3a.secret.key
> SECRETKEY
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4102) Only one row found in a JSON document that contains multiple items.

2015-11-17 Thread aditya menon (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010192#comment-15010192
 ] 

aditya menon commented on DRILL-4102:
-

Yep it is.

> Only one row found in a JSON document that contains multiple items.
> ---
>
> Key: DRILL-4102
> URL: https://issues.apache.org/jira/browse/DRILL-4102
> Project: Apache Drill
>  Issue Type: Bug
> Environment: OS X, Drill embedded, v1.1.0 installed via HomeBrew
>Reporter: aditya menon
>
> I tried to analyse a JSON file that had the following (sample) structure:
> {code:json}
> {
> "Key1": {
>   "htmltags": " attr3='charlie' />"
> },
> "Key2": {
>   "htmltags": " attr3='mike' />"
> },
> "Key3": {
>   "htmltags": " />"
> }
> }
> {code}
> (Apologies for the obfuscation, I am unable to publish the original dataset. 
> But the structure is exactly the same. Note especially how the keys and other 
> data points *differ* in some places, and remain identical in others.)
> When I run a {code:sql}SELECT * FROM DataFile.json{code} what I get is a 
> single row listed under three columns: {code:html}" />"{code} [i.e., only the 
> entry `Key1.htmltags`] .
> Ideally, I should see three rows, each with entries from Key1..Key3, listed 
> under the correct respective column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4102) Only one row found in a JSON document that contains multiple items.

2015-11-17 Thread aditya menon (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

aditya menon updated DRILL-4102:

Description: 
I tried to analyse a JSON file that had the following (sample) structure:

{code:json}
{
"Key1": {
  "htmltags": ""
},
"Key2": {
  "htmltags": ""
},
"Key3": {
  "htmltags": ""
}
}
{code}

(Apologies for the obfuscation, I am unable to publish the original dataset. 
But the structure is exactly the same. Note especially how the keys and other 
data points *differ* in some places, and remain identical in others.)

When I run a {code:sql}SELECT * FROM DataFile.json{code} what I get is a single 
row listed under three columns: {code:html}""{code} [i.e., only the entry 
`Key1.htmltags`] .

Ideally, I should see three rows, each with entries from Key1..Key3, listed 
under the correct respective column.

  was:
I tried to analyse a JSON file that had the following (sample) structure:

{code:json}
{
"Key1": {
  "htmltags": ""
},
"Key2": {
  "htmltags": ""
},
"Key3": {
  "htmltags": ""
}
}
{code}

(Apologies for the obfuscation, I am unable to publish the original dataset. 
But the structure is exactly the same. Note especially how the keys and other 
data points *differ* in some places, and remain identical in others.)

When I run a {code:sql}SELECT * FROM DataFile.son{code} what I get is a single 
row listed under three columns: `""` [i.e., only the entry 
`Key1.htmltags`] .

Ideally, I should see three rows, each with entries from Key1..Key3, listed 
under the correct respective column.


> Only one row found in a JSON document that contains multiple items.
> ---
>
> Key: DRILL-4102
> URL: https://issues.apache.org/jira/browse/DRILL-4102
> Project: Apache Drill
>  Issue Type: Bug
> Environment: OS X, Drill embedded, v1.1.0 installed via HomeBrew
>Reporter: aditya menon
>
> I tried to analyse a JSON file that had the following (sample) structure:
> {code:json}
> {
> "Key1": {
>   "htmltags": " attr3='charlie' />"
> },
> "Key2": {
>   "htmltags": " attr3='mike' />"
> },
> "Key3": {
>   "htmltags": " />"
> }
> }
> {code}
> (Apologies for the obfuscation, I am unable to publish the original dataset. 
> But the structure is exactly the same. Note especially how the keys and other 
> data points *differ* in some places, and remain identical in others.)
> When I run a {code:sql}SELECT * FROM DataFile.json{code} what I get is a 
> single row listed under three columns: {code:html}" />"{code} [i.e., only the 
> entry `Key1.htmltags`] .
> Ideally, I should see three rows, each with entries from Key1..Key3, listed 
> under the correct respective column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4056) Avro deserialization corrupts data

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010116#comment-15010116
 ] 

ASF GitHub Bot commented on DRILL-4056:
---

GitHub user jaltekruse opened a pull request:

https://github.com/apache/drill/pull/266

DRILL-4056: Avro corruption bug with UTF-8 strings



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaltekruse/incubator-drill 
4056-avro-corruption-bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/266.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #266


commit a3e0cbe3820a0350d58c59f374877a12184850e0
Author: Jason Altekruse 
Date:   2015-11-13T23:46:58Z

DRILL-4056: Fix corruption bug reading string data out of Avro

commit 44460fd5a72d6a61b232c335bb8beaaff9daad87
Author: Jason Altekruse 
Date:   2015-11-14T00:26:33Z

DRILL-4056: Part 2 - Cleanup in Avro reader.

Removed use of unnecessary Holder objects. Added restriction on batch
size produced by a single call to next. Did not get a chance to confirm
but it looks like it was reading an entire file into a single batch,
which could have serious performance impacts on very large files.

commit dc084c1255a59aead865e641f952e9e162d4c5e5
Author: Jason Altekruse 
Date:   2015-11-17T23:42:44Z

DRILL-4056: Part 3 - Adding results verification to avro tests.

Task to be finished as part of DRILL-4110.




> Avro deserialization corrupts data
> --
>
> Key: DRILL-4056
> URL: https://issues.apache.org/jira/browse/DRILL-4056
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0
> Environment: Ubuntu 15.04 - Oracle Java
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
> Fix For: 1.3.0
>
> Attachments: test.zip
>
>
> I have an Avro file that support the following data/schema:
> {"field":"some", "classification":{"variant":"Gæst"}}
> When I select 10 rows from this file I get:
> +-+
> |   EXPR$0|
> +-+
> | Gæst|
> | Voksen  |
> | Voksen  |
> | Invitation KIF KBH  |
> | Invitation KIF KBH  |
> | Ordinarie pris KBH  |
> | Ordinarie pris KBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> +-+
> The bug is that the field values are incorrectly de-serialized and the value 
> from the previous row is retained if the subsequent row is shorter.
> The sql query:
> "select s.classification.variant variant from dfs. as s limit 10;"
> That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the 
> previous row had the value "Invitation KIF KBH".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4102) Only one row found in a JSON document that contains multiple items.

2015-11-17 Thread aditya menon (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

aditya menon updated DRILL-4102:

Description: 
I tried to analyse a JSON file that had the following (sample) structure:

{code:json}
{
"Key1": {
  "htmltags": ""
},
"Key2": {
  "htmltags": ""
},
"Key3": {
  "htmltags": ""
}
}
{code}

(Apologies for the obfuscation, I am unable to publish the original dataset. 
But the structure is exactly the same. Note especially how the keys and other 
data points *differ* in some places, and remain identical in others.)

When I run a `SELECT * FROM DataFile.json` what I get is a single row listed 
under three columns: `""` [i.e., only the entry `Key1.htmltags`] .

Ideally, I should see three rows, each with entries from Key1..Key3, listed 
under the correct respective column.

  was:
I tried to analyse a JSON file that had the following (sample) structure:

```
{
"Key1": {
  "htmltags": ""
},
"Key2": {
  "htmltags": ""
},
"Key3": {
  "htmltags": ""
}
}
```

(Apologies for the obfuscation, I am unable to publish the original dataset. 
But the structure is exactly the same. Note especially how the keys and other 
data points *differ* in some places, and remain identical in others.)

When I run a `SELECT * FROM DataFile.json` what I get is a single row listed 
under three columns: `""` [i.e., only the entry `Key1.htmltags`] .

Ideally, I should see three rows, each with entries from Key1..Key3, listed 
under the correct respective column.


> Only one row found in a JSON document that contains multiple items.
> ---
>
> Key: DRILL-4102
> URL: https://issues.apache.org/jira/browse/DRILL-4102
> Project: Apache Drill
>  Issue Type: Bug
> Environment: OS X, Drill embedded, v1.1.0 installed via HomeBrew
>Reporter: aditya menon
>
> I tried to analyse a JSON file that had the following (sample) structure:
> {code:json}
> {
> "Key1": {
>   "htmltags": " attr3='charlie' />"
> },
> "Key2": {
>   "htmltags": " attr3='mike' />"
> },
> "Key3": {
>   "htmltags": " />"
> }
> }
> {code}
> (Apologies for the obfuscation, I am unable to publish the original dataset. 
> But the structure is exactly the same. Note especially how the keys and other 
> data points *differ* in some places, and remain identical in others.)
> When I run a `SELECT * FROM DataFile.json` what I get is a single row listed 
> under three columns: `" />"` [i.e., only the entry `Key1.htmltags`] .
> Ideally, I should see three rows, each with entries from Key1..Key3, listed 
> under the correct respective column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4102) Only one row found in a JSON document that contains multiple items.

2015-11-17 Thread aditya menon (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

aditya menon updated DRILL-4102:

Description: 
I tried to analyse a JSON file that had the following (sample) structure:

{code:json}
{
"Key1": {
  "htmltags": ""
},
"Key2": {
  "htmltags": ""
},
"Key3": {
  "htmltags": ""
}
}
{code}

(Apologies for the obfuscation, I am unable to publish the original dataset. 
But the structure is exactly the same. Note especially how the keys and other 
data points *differ* in some places, and remain identical in others.)

When I run a {code:sql}SELECT * FROM DataFile.son{code} what I get is a single 
row listed under three columns: `""` [i.e., only the entry 
`Key1.htmltags`] .

Ideally, I should see three rows, each with entries from Key1..Key3, listed 
under the correct respective column.

  was:
I tried to analyse a JSON file that had the following (sample) structure:

{code:json}
{
"Key1": {
  "htmltags": ""
},
"Key2": {
  "htmltags": ""
},
"Key3": {
  "htmltags": ""
}
}
{code}

(Apologies for the obfuscation, I am unable to publish the original dataset. 
But the structure is exactly the same. Note especially how the keys and other 
data points *differ* in some places, and remain identical in others.)

When I run a `SELECT * FROM DataFile.json` what I get is a single row listed 
under three columns: `""` [i.e., only the entry `Key1.htmltags`] .

Ideally, I should see three rows, each with entries from Key1..Key3, listed 
under the correct respective column.


> Only one row found in a JSON document that contains multiple items.
> ---
>
> Key: DRILL-4102
> URL: https://issues.apache.org/jira/browse/DRILL-4102
> Project: Apache Drill
>  Issue Type: Bug
> Environment: OS X, Drill embedded, v1.1.0 installed via HomeBrew
>Reporter: aditya menon
>
> I tried to analyse a JSON file that had the following (sample) structure:
> {code:json}
> {
> "Key1": {
>   "htmltags": " attr3='charlie' />"
> },
> "Key2": {
>   "htmltags": " attr3='mike' />"
> },
> "Key3": {
>   "htmltags": " />"
> }
> }
> {code}
> (Apologies for the obfuscation, I am unable to publish the original dataset. 
> But the structure is exactly the same. Note especially how the keys and other 
> data points *differ* in some places, and remain identical in others.)
> When I run a {code:sql}SELECT * FROM DataFile.son{code} what I get is a 
> single row listed under three columns: `" attr2='delta' />"` [i.e., only the entry 
> `Key1.htmltags`] .
> Ideally, I should see three rows, each with entries from Key1..Key3, listed 
> under the correct respective column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4100) DRILL JOIN STREAMING AGG ERROR

2015-11-17 Thread huntersjm (JIRA)
huntersjm created DRILL-4100:


 Summary: DRILL JOIN STREAMING AGG ERROR
 Key: DRILL-4100
 URL: https://issues.apache.org/jira/browse/DRILL-4100
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.1.0
Reporter: huntersjm


SQL LIKE :
`select t1.a1 from (
select a1,count(a2) as total from t  group by a1
) as a join (
select a1,count(a2) as total from t where a3 = 'true' group by a1 order by 
total desc limit 100
) as f on t2.a1 = t1.a1 
`
GET ERROR RESULT

PART OF Operator Profiles:
01-xx-00 - HASH_PARTITION_SENDER
Minor Fragment  Setup Time  Process TimeWait Time   Max Batches 
Max Records Peak Memory
01-00-000.000s  0.301s  0.028s  3   65,536  90KB
01-01-000.000s  0.406s  0.028s  3   65,536  91KB
01-xx-01 - STREAMING_AGGREGATE
Minor Fragment  Setup Time  Process TimeWait Time   Max Batches 
Max Records Peak Memory
01-00-010.174s  0.564s  0.000s  29  1,812,457   1MB
01-01-010.497s  0.553s  0.000s  29  1,812,973   1MB





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4101) The argument 'pattern' of Function 'like' has to be constant!

2015-11-17 Thread david_hudavy (JIRA)
david_hudavy created DRILL-4101:
---

 Summary: The argument 'pattern' of Function 'like' has to be 
constant!
 Key: DRILL-4101
 URL: https://issues.apache.org/jira/browse/DRILL-4101
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.3.0
 Environment: drill1.2
Reporter: david_hudavy


0: jdbc:drill:zk=local> select * from dfs.tmp.ta limit 10;
+--+--+
|  rdn_4   |   imsi   |
+--+--+
| mscId=UPG00494500412500  | 272004500412500  |
| mscId=UPG00494500436500  | 272004500436500  |
| mscId=UPG00494501833000  | 272004501833000  |
| mscId=UPG00494502712000  | 272004502712000  |
| mscId=UPG00494502732500  | 272004502732500  |
| mscId=UPG00494502845500  | 272004502845500  |
| mscId=UPG00494505721000  | 272004505721000  |
| mscId=UPG00494507227500  | 272004507227500  |
| mscId=UPG00494509548500  | 272004509548500  |
| mscId=UPG00494501644500  | 272004501644500  |
+--+--+
10 rows selected (0.344 seconds)
0: jdbc:drill:zk=local> select * from dfs.tmp.tb;
+-+-+
|  rdn_4  | epsvplmnid  |
+-+-+
| mscId=149000579913  | 46000   |
| mscId=149000579912  | 262280  |
+-+-+
2 rows selected (0.112 seconds)

SELECT count(*) AS cnt
FROM dfs.tmp.ta,dfs.tmp.tb
WHERE ta.rdn_4 = tb.rdn_4
AND ta.imsi NOT LIKE concat(tb.epsvplmnid,'%')

Error: SYSTEM ERROR: DrillRuntimeException: The argument 'pattern' of Function 
'like' has to be constant!

Fragment 0:0

[Error Id: f103529c-60f4-4b8f-8d7a-b1f0619aab30 on vm1-4:31010] (state=,code=0)
[Error Id: f103529c-60f4-4b8f-8d7a-b1f0619aab30 on vm1-4:31010]
at 
org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
 [drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:110) 
[drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
 [drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32)
 [drill-java-exec-1.2.0.jar:1.2.0]
at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) 
[drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) 
[drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) 
[drill-java-exec-1.2.0.jar:1.2.0]
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
 [netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
 [netty-handler-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 [netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
 [netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 

[jira] [Created] (DRILL-4099) DRILL QUERY LIMIT ERROR

2015-11-17 Thread huntersjm (JIRA)
huntersjm created DRILL-4099:


 Summary: DRILL QUERY LIMIT ERROR
 Key: DRILL-4099
 URL: https://issues.apache.org/jira/browse/DRILL-4099
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.1.0
Reporter: huntersjm


I query `select attr from table limit 65536`, then I get error : 
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
IndexOutOfBoundsException: index: 131072, length: 2 (expected: range(0, 
131072)) Fragment 0:0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4070) Metadata Caching : min/max values are null for varchar columns in auto partitioned data

2015-11-17 Thread Parth Chandra (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009413#comment-15009413
 ] 

Parth Chandra commented on DRILL-4070:
--

I'm working on putting together a migration tool that will update the created 
by string to the current version for files created by older versions of Drill. 
That should take care of this problem. I'll post the link to the tool in this 
JIRA.

> Metadata Caching : min/max values are null for varchar columns in auto 
> partitioned data
> ---
>
> Key: DRILL-4070
> URL: https://issues.apache.org/jira/browse/DRILL-4070
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Priority: Blocker
> Fix For: 1.3.0
>
> Attachments: cache.txt, fewtypes_varcharpartition.tar.tgz
>
>
> git.commit.id.abbrev=e78e286
> The metadata cache file created contains incorrect values for min/max fields 
> for varchar colums. The data is also partitioned on the varchar column
> {code}
> refresh table metadata fewtypes_varcharpartition;
> {code}
> As a result partition pruning is not happening. This was working after 
> DRILL-3937 has been fixed (d331330efd27dbb8922024c4a18c11e76a00016b)
> I attached the data set and the cache file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3423) Add New HTTPD format plugin

2015-11-17 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009557#comment-15009557
 ] 

Jacques Nadeau commented on DRILL-3423:
---

Here is my alternative proposal: 

With the log format above: 
"%h %t \"%r\" %>s %b \"%{Referer}i\""

I propose a user gets the following fields (in order)

remote_host (varchar)
request_receive_time (drill timestamp)
request_method (varchar)
request_uri (varchar)
response_status (int)
response_bytes (bigint)
header_referer

Additionally, I think we should provide two new functions: 

parse_url(varchar url)
parse_url_query(varchar querystring, varchar pairDelimiter, varchar 
keyValueDelimiter)

parse_url(varchar) would provide an output of map type similar to: 
{code}
{
  protocol: ...,
  user: ...,
  password: ...,
  host: ...,
  port: 
  path: 
  query:
  fragment:
}
{code}

parse_url_query(...) would return an array of key values:
[
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."}
]

In response to your proposal: I don't think it makes sense to return many 
fields for a date field. Drill already provides functionality to get parts of a 
date. I also don't think it makes sense to prefix a field with its datatype, we 
don't do that anywhere else in Drill. We should also expose parsing an optional 
behavior in Drill.  Note also that my proposal substantially reduces the number 
of fields exposed to the user. I think this proposal has much better usability 
in the context of sql.

If you want to take advantage of the underlying formats capabilities, you can 
treat that as a pushdown of a particular function (data part or the url parsing 
functions above).





> Add New HTTPD format plugin
> ---
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Jacques Nadeau
>Assignee: Jim Scott
> Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough 
> to move the logparser project to be released under the Apache License.  Can 
> find it here:
> 
> nl.basjes.parse.httpdlog
> httpdlog-parser
> 2.0
> 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3423) Add New HTTPD format plugin

2015-11-17 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009557#comment-15009557
 ] 

Jacques Nadeau edited comment on DRILL-3423 at 11/17/15 9:30 PM:
-

Here is my alternative proposal: 

With the log format above: 
"%h %t \"%r\" %>s %b \"%{Referer}i\""

I propose a user gets the following fields (in order)

remote_host (varchar)
request_receive_time (drill timestamp)
request_method (varchar)
request_uri (varchar)
response_status (int)
response_bytes (bigint)
header_referer

Additionally, I think we should provide two new functions: 

parse_url(varchar url)
parse_url_query(varchar querystring, varchar pairDelimiter, varchar 
keyValueDelimiter)

parse_url(varchar) would provide an output of map type similar to: 
{code}
{
  protocol: ...,
  user: ...,
  password: ...,
  host: ...,
  port: 
  path: 
  query:
  fragment:
}
{code}

parse_url_query(...) would return an array of key values:
{code}
[
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."}
]
{code}
In response to your proposal: I don't think it makes sense to return many 
fields for a date field. Drill already provides functionality to get parts of a 
date. I also don't think it makes sense to prefix a field with its datatype, we 
don't do that anywhere else in Drill. We should also expose parsing an optional 
behavior in Drill.  Note also that my proposal substantially reduces the number 
of fields exposed to the user. I think this proposal has much better usability 
in the context of sql.

If you want to take advantage of the underlying formats capabilities, you can 
treat that as a pushdown of a particular function (data part or the url parsing 
functions above).






was (Author: jnadeau):
Here is my alternative proposal: 

With the log format above: 
"%h %t \"%r\" %>s %b \"%{Referer}i\""

I propose a user gets the following fields (in order)

remote_host (varchar)
request_receive_time (drill timestamp)
request_method (varchar)
request_uri (varchar)
response_status (int)
response_bytes (bigint)
header_referer

Additionally, I think we should provide two new functions: 

parse_url(varchar url)
parse_url_query(varchar querystring, varchar pairDelimiter, varchar 
keyValueDelimiter)

parse_url(varchar) would provide an output of map type similar to: 
{code}
{
  protocol: ...,
  user: ...,
  password: ...,
  host: ...,
  port: 
  path: 
  query:
  fragment:
}
{code}

parse_url_query(...) would return an array of key values:
[
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."}
]

In response to your proposal: I don't think it makes sense to return many 
fields for a date field. Drill already provides functionality to get parts of a 
date. I also don't think it makes sense to prefix a field with its datatype, we 
don't do that anywhere else in Drill. We should also expose parsing an optional 
behavior in Drill.  Note also that my proposal substantially reduces the number 
of fields exposed to the user. I think this proposal has much better usability 
in the context of sql.

If you want to take advantage of the underlying formats capabilities, you can 
treat that as a pushdown of a particular function (data part or the url parsing 
functions above).





> Add New HTTPD format plugin
> ---
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Jacques Nadeau
>Assignee: Jim Scott
> Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough 
> to move the logparser project to be released under the Apache License.  Can 
> find it here:
> 
> nl.basjes.parse.httpdlog
> httpdlog-parser
> 2.0
> 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3854) IOB Exception : CONVERT_FROM (sal, int_be)

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009693#comment-15009693
 ] 

ASF GitHub Bot commented on DRILL-3854:
---

Github user hsuanyi commented on the pull request:

https://github.com/apache/drill/pull/262#issuecomment-157528584
  
Thanks for that. 
I also looked through the implementations of getBuffers(). As long as that 
in BaseDataValueVector obeys, the derived ones should follow the contract 


> IOB Exception : CONVERT_FROM (sal, int_be)
> --
>
> Key: DRILL-3854
> URL: https://issues.apache.org/jira/browse/DRILL-3854
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: log, run_time_code.txt
>
>
> CONVERT_FROM function results in IOB Exception
> Drill master commit id : b9afcf8f
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select salary from Emp;
> +-+
> | salary  |
> +-+
> | 8   |
> | 9   |
> | 20  |
> | 95000   |
> | 85000   |
> | 9   |
> | 10  |
> | 87000   |
> | 8   |
> | 10  |
> | 99000   |
> +-+
> 11 rows selected (0.535 seconds)
> # create table using above Emp table
> create table tbl_int_be as select convert_to(salary, 'int_be') sal from Emp;
> 0: jdbc:drill:schema=dfs.tmp> alter session set `planner.slice_target`=1;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.19 seconds)
> # Below query results in IOB on server.
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be') from 
> tbl_int_be order by sal;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: DrillBuf(ridx: 0, widx: 158, 
> cap: 158/158, unwrapped: SlicedByteBuf(ridx: 0, widx: 158, cap: 158/158, 
> unwrapped: UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 
> 0, cap: 417/417.slice(158, 44)
> Fragment 2:0
> [Error Id: 4ee1361d-9877-45eb-bde6-57d5add9fe5e on centos-04.qa.lab:31010] 
> (state=,code=0)
> # Apply convert_from function and project original column results in IOB on 
> client. (because Error Id is missing)
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be'), sal from 
> tbl_int_be;
> Error: Unexpected RuntimeException: java.lang.IndexOutOfBoundsException: 
> DrillBuf(ridx: 0, widx: 114, cap: 114/114, unwrapped: DrillBuf(ridx: 321, 
> widx: 321, cap: 321/321, unwrapped: 
> UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 0, cap: 
> 321/321.slice(55, 103) (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3423) Add New HTTPD format plugin

2015-11-17 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009557#comment-15009557
 ] 

Jacques Nadeau edited comment on DRILL-3423 at 11/17/15 9:38 PM:
-

Here is my alternative proposal: 

With the log format above: 
{code}
"%h %t \"%r\" %>s %b \"%{Referer}i\""
{code}

I propose a user gets the following fields (in order)

remote_host (varchar)
request_receive_time (drill timestamp)
request_method (varchar)
request_uri (varchar)
response_status (int)
response_bytes (bigint)
header_referer

Additionally, I think we should provide two new functions: 

parse_url(varchar url)
parse_url_query(varchar querystring, varchar pairDelimiter, varchar 
keyValueDelimiter)

parse_url(varchar) would provide an output of map type similar to: 
{code}
{
  protocol: ...,
  user: ...,
  password: ...,
  host: ...,
  port: 
  path: 
  query:
  fragment:
}
{code}

parse_url_query(...) would return an array of key values:
{code}
[
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."}
]
{code}
In response to your proposal: I don't think it makes sense to return many 
fields for a date field. Drill already provides functionality to get parts of a 
date. I also don't think it makes sense to prefix a field with its datatype, we 
don't do that anywhere else in Drill. We should also expose parsing an optional 
behavior in Drill.  Note also that my proposal substantially reduces the number 
of fields exposed to the user. I think this proposal has much better usability 
in the context of sql.

If you want to take advantage of the underlying formats capabilities, you can 
treat that as a pushdown of a particular function (data part or the url parsing 
functions above).






was (Author: jnadeau):
Here is my alternative proposal: 

With the log format above: 
"%h %t \"%r\" %>s %b \"%{Referer}i\""

I propose a user gets the following fields (in order)

remote_host (varchar)
request_receive_time (drill timestamp)
request_method (varchar)
request_uri (varchar)
response_status (int)
response_bytes (bigint)
header_referer

Additionally, I think we should provide two new functions: 

parse_url(varchar url)
parse_url_query(varchar querystring, varchar pairDelimiter, varchar 
keyValueDelimiter)

parse_url(varchar) would provide an output of map type similar to: 
{code}
{
  protocol: ...,
  user: ...,
  password: ...,
  host: ...,
  port: 
  path: 
  query:
  fragment:
}
{code}

parse_url_query(...) would return an array of key values:
{code}
[
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."},
  {key: "...", value: "..."}
]
{code}
In response to your proposal: I don't think it makes sense to return many 
fields for a date field. Drill already provides functionality to get parts of a 
date. I also don't think it makes sense to prefix a field with its datatype, we 
don't do that anywhere else in Drill. We should also expose parsing an optional 
behavior in Drill.  Note also that my proposal substantially reduces the number 
of fields exposed to the user. I think this proposal has much better usability 
in the context of sql.

If you want to take advantage of the underlying formats capabilities, you can 
treat that as a pushdown of a particular function (data part or the url parsing 
functions above).





> Add New HTTPD format plugin
> ---
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Jacques Nadeau
>Assignee: Jim Scott
> Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough 
> to move the logparser project to be released under the Apache License.  Can 
> find it here:
> 
> nl.basjes.parse.httpdlog
> httpdlog-parser
> 2.0
> 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3180) Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from Apache Drill

2015-11-17 Thread Olav Jordens (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006030#comment-15006030
 ] 

Olav Jordens edited comment on DRILL-3180 at 11/17/15 10:29 PM:


Hi Jacques,

I followed Magnus' suggestion to create the storage plugin to Netezza using a 
database like so:
{
  "type": "jdbc",
  "driver": "org.netezza.Driver",
  "url": "jdbc:netezza://edw-vip-prod:5480/SYSTEM",
  "username": "username",
  "password": "password",
  "enabled": true
}
and it gives "Success" in version 1.3.0. I can see the tables now - thanks!


was (Author: olavj):
Hi Jacques,

I followed Magnus' suggestion to create the storage plugin to Netezza using a 
database like so:
{
  "type": "jdbc",
  "driver": "org.netezza.Driver",
  "url": "jdbc:netezza://edw-vip-prod:5480/SYSTEM",
  "username": "username",
  "password": "password",
  "enabled": true
}
and it gives "Success" in version 1.2.0. However, when I try your version 1.3.0 
linked above, I get the unable to create/update storage plugin error.

> Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and 
> Netezza from Apache Drill
> ---
>
> Key: DRILL-3180
> URL: https://issues.apache.org/jira/browse/DRILL-3180
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.0.0
>Reporter: Magnus Pierre
>Assignee: Jacques Nadeau
>  Labels: Drill, JDBC, plugin
> Fix For: 1.2.0
>
> Attachments: patch.diff, pom.xml, storage-mpjdbc.zip
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I have developed the base code for a JDBC storage-plugin for Apache Drill. 
> The code is primitive but consitutes a good starting point for further 
> coding. Today it provides primitive support for SELECT against RDBMS with 
> JDBC. 
> The goal is to provide complete SELECT support against RDBMS with push down 
> capabilities.
> Currently the code is using standard JDBC classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4107) Broken links in web site

2015-11-17 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated DRILL-4107:
---
Attachment: Screenshot from 2015-11-17 14-26-46.png

> Broken links in web site
> 
>
> Key: DRILL-4107
> URL: https://issues.apache.org/jira/browse/DRILL-4107
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Julian Hyde
> Attachments: Screenshot from 2015-11-17 14-26-46.png
>
>
> Following CALCITE-979 I ran http://www.brokenlinkcheck.com and found 40 
> broken links at http://drill.apache.org. Most of them are shown in the 
> attached screenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3854) IOB Exception : CONVERT_FROM (sal, int_be)

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009560#comment-15009560
 ] 

ASF GitHub Bot commented on DRILL-3854:
---

Github user StevenMPhillips commented on a diff in the pull request:

https://github.com/apache/drill/pull/262#discussion_r45124172
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/WritableBatch.java ---
@@ -149,6 +149,7 @@ public static WritableBatch getBatchNoHV(int 
recordCount, Iterable
   }
 
   for (DrillBuf b : vv.getBuffers(true)) {
+b.readerIndex(0);
--- End diff --

Yes, I think that is correct. It should be outside the if (clear) block.

On Tue, Nov 17, 2015 at 1:07 PM, Jacques Nadeau 
wrote:

> In
> 
exec/java-exec/src/main/java/org/apache/drill/exec/record/WritableBatch.java
> :
>
> > @@ -149,6 +149,7 @@ public static WritableBatch getBatchNoHV(int 
recordCount, Iterable
> >}
> >
> >for (DrillBuf b : vv.getBuffers(true)) {
> > +b.readerIndex(0);
>
> I believe the contract of getBuffers() is that buffers are returned in a
> reader appropriate state. As such, you should figure out which buffers are
> failing to guarantee this. It should be easy as there are only a small
> amount of implementations of this. In other words, where are we failing to
> ensure this?
>
> Given the code I looked at before, I think the problem may be that the
> readerIndex behavior is only inside the clear statement. @StevenMPhillips
>  , it seems like this line:
> 
https://github.com/apache/drill/blame/master/exec/vector/src/main/java/org/apache/drill/exec/vector/BaseDataValueVector.java#L63
> should be outside the if(clear). Thoughts?
>
> —
> Reply to this email directly or view it on GitHub
> .
>



> IOB Exception : CONVERT_FROM (sal, int_be)
> --
>
> Key: DRILL-3854
> URL: https://issues.apache.org/jira/browse/DRILL-3854
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: log, run_time_code.txt
>
>
> CONVERT_FROM function results in IOB Exception
> Drill master commit id : b9afcf8f
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select salary from Emp;
> +-+
> | salary  |
> +-+
> | 8   |
> | 9   |
> | 20  |
> | 95000   |
> | 85000   |
> | 9   |
> | 10  |
> | 87000   |
> | 8   |
> | 10  |
> | 99000   |
> +-+
> 11 rows selected (0.535 seconds)
> # create table using above Emp table
> create table tbl_int_be as select convert_to(salary, 'int_be') sal from Emp;
> 0: jdbc:drill:schema=dfs.tmp> alter session set `planner.slice_target`=1;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.19 seconds)
> # Below query results in IOB on server.
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be') from 
> tbl_int_be order by sal;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: DrillBuf(ridx: 0, widx: 158, 
> cap: 158/158, unwrapped: SlicedByteBuf(ridx: 0, widx: 158, cap: 158/158, 
> unwrapped: UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 
> 0, cap: 417/417.slice(158, 44)
> Fragment 2:0
> [Error Id: 4ee1361d-9877-45eb-bde6-57d5add9fe5e on centos-04.qa.lab:31010] 
> (state=,code=0)
> # Apply convert_from function and project original column results in IOB on 
> client. (because Error Id is missing)
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(sal, 'int_be'), sal from 
> tbl_int_be;
> Error: Unexpected RuntimeException: java.lang.IndexOutOfBoundsException: 
> DrillBuf(ridx: 0, widx: 114, cap: 114/114, unwrapped: DrillBuf(ridx: 321, 
> widx: 321, cap: 321/321, unwrapped: 
> UnsafeDirectLittleEndian(PooledUnsafeDirectByteBuf(ridx: 0, widx: 0, cap: 
> 321/321.slice(55, 103) (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3423) Add New HTTPD format plugin

2015-11-17 Thread Tomer Shiran (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009605#comment-15009605
 ] 

Tomer Shiran commented on DRILL-3423:
-

I agree we shouldn't expand a date into multiple parts when we already have a 
date/timestamp type.

For the functions you mentioned, I think we should look at the functions 
available in Python (urllib),JavaScript or relational databases.

> Add New HTTPD format plugin
> ---
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Jacques Nadeau
>Assignee: Jim Scott
> Fix For: 1.4.0
>
>
> Add an HTTPD logparser based format plugin.  The author has been kind enough 
> to move the logparser project to be released under the Apache License.  Can 
> find it here:
> 
> nl.basjes.parse.httpdlog
> httpdlog-parser
> 2.0
> 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (DRILL-3180) Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from Apache Drill

2015-11-17 Thread Olav Jordens (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olav Jordens updated DRILL-3180:

Comment: was deleted

(was: Another issue I am having in 1.2.0 (running local): When I run
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE
FROM INFORMATION_SCHEMA.`TABLES`
ORDER BY TABLE_NAME DESC;
I see my storage plugin listed with two 'Tables' NZ_MAT_VALUE_TABLE and 
NZ_MAT_CONF, but none of the tables in Netezza. I have tried to do a select * 
... limit 20 from the tables I know should be there, but each time I get a 
Table not found error. Also if I issue this query:
select * from netezzaplugin.`NZ_MAT_CONF`; 
I get:
org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: The 
JDBC storage plugin failed while trying setup the SQL query. sql SELECT * FROM 
"NZM"."NZ_MAT_CONF" plugin netezzaplugin Fragment 0:0

So I guess my question is: How should I query my tables in Netezza?

Thanks for your support - the potential for Drill looks really good to me once 
I get my head around it. I would ultimately like to query across Netezza and 
Hadoop. Is this geared towards the MapR distribution, or should all 
functionality be available in any case?
Olav)

> Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and 
> Netezza from Apache Drill
> ---
>
> Key: DRILL-3180
> URL: https://issues.apache.org/jira/browse/DRILL-3180
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.0.0
>Reporter: Magnus Pierre
>Assignee: Jacques Nadeau
>  Labels: Drill, JDBC, plugin
> Fix For: 1.2.0
>
> Attachments: patch.diff, pom.xml, storage-mpjdbc.zip
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I have developed the base code for a JDBC storage-plugin for Apache Drill. 
> The code is primitive but consitutes a good starting point for further 
> coding. Today it provides primitive support for SELECT against RDBMS with 
> JDBC. 
> The goal is to provide complete SELECT support against RDBMS with push down 
> capabilities.
> Currently the code is using standard JDBC classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4107) Broken links in web site

2015-11-17 Thread Julian Hyde (JIRA)
Julian Hyde created DRILL-4107:
--

 Summary: Broken links in web site
 Key: DRILL-4107
 URL: https://issues.apache.org/jira/browse/DRILL-4107
 Project: Apache Drill
  Issue Type: Bug
Reporter: Julian Hyde
 Attachments: Screenshot from 2015-11-17 14-26-46.png

Following CALCITE-979 I ran http://www.brokenlinkcheck.com and found 40 broken 
links at http://drill.apache.org. Most of them are shown in the attached 
screenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4056) Avro deserialization corrupts data

2015-11-17 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-4056.
---
Resolution: Fixed

Resolved in 45d0326ccbf9bad8936374174116ae8e17461cb0

> Avro deserialization corrupts data
> --
>
> Key: DRILL-4056
> URL: https://issues.apache.org/jira/browse/DRILL-4056
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0
> Environment: Ubuntu 15.04 - Oracle Java
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
> Fix For: 1.3.0
>
> Attachments: test.zip
>
>
> I have an Avro file that support the following data/schema:
> {"field":"some", "classification":{"variant":"Gæst"}}
> When I select 10 rows from this file I get:
> +-+
> |   EXPR$0|
> +-+
> | Gæst|
> | Voksen  |
> | Voksen  |
> | Invitation KIF KBH  |
> | Invitation KIF KBH  |
> | Ordinarie pris KBH  |
> | Ordinarie pris KBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> +-+
> The bug is that the field values are incorrectly de-serialized and the value 
> from the previous row is retained if the subsequent row is shorter.
> The sql query:
> "select s.classification.variant variant from dfs. as s limit 10;"
> That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the 
> previous row had the value "Invitation KIF KBH".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4063) Missing files/classes needed for S3a access

2015-11-17 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-4063.
---
Resolution: Fixed
  Assignee: Abhijit Pol

Resolved in 2b82a77693a28b8c76959e7a807bc7a501a6efc5 (1.3.0 branch)

> Missing files/classes needed for S3a access
> ---
>
> Key: DRILL-4063
> URL: https://issues.apache.org/jira/browse/DRILL-4063
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0
> Environment: All
>Reporter: Nathan Griffith
>Assignee: Abhijit Pol
>  Labels: aws, aws-s3, s3, storage
>
> Specifying
> {code}
> "connection": "s3a://"
> {code}
> results in the following error:
> {code}
> Error: SYSTEM ERROR: ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> {code}
> I can fix this by dropping in these files from the hadoop binary tarball:
> hadoop-aws-2.6.2.jar
> aws-java-sdk-1.7.4.jar
> And then adding this to my core-site.xml:
> {code:xml}
>   
> fs.s3a.access.key
> ACCESSKEY
>   
>   
> fs.s3a.secret.key
> SECRETKEY
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4103) Add additional metadata to Parquet files generated by Drill

2015-11-17 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-4103.
---
Resolution: Fixed

Resolved in a2896681769ac64b5935db6b09c5b0978f05d2f1 (1.3.0 branch)

> Add additional metadata to Parquet files generated by Drill
> ---
>
> Key: DRILL-4103
> URL: https://issues.apache.org/jira/browse/DRILL-4103
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Jacques Nadeau
>Assignee: Julien Le Dem
> Fix For: 1.3.0
>
>
> For future compatibility efforts, it would be good for us to automatically 
> add metadata to Drill generated Parquet files. At a minimum, we should add 
> information about the fact that Drill generated the files and the version of 
> Drill that generated the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)