[jira] [Assigned] (DRILL-4211) Inconsistent results from a joined sql statement to postgres tables

2017-08-08 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-4211:


Assignee: Timothy Farkas

> Inconsistent results from a joined sql statement to postgres tables
> ---
>
> Key: DRILL-4211
> URL: https://issues.apache.org/jira/browse/DRILL-4211
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.3.0
> Environment: Postgres db stroage
>Reporter: Robert Hamilton-Smith
>Assignee: Timothy Farkas
>  Labels: newbie
>
> When making an sql statement that incorporates a join to a table and then a 
> self join to that table to get a parent value , Drill brings back 
> inconsistent results. 
> Here is the sql in postgres with correct output:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from transactions trx
> join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join categories w1 on (cat.categoryparentguid = w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL;
> {code}
> Output:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|food|
> |id1|restaurants|food|
> |id2|Coffee Shops|food|
> |id2|Coffee Shops|food|
> When run in Drill with correct storage prefix:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from db.schema.transactions trx
> join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join db.schema.wpfm_categories w1 on (cat.categoryparentguid = 
> w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL
> {code}
> Results are:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|null|
> |id1|restaurants|null|
> |id2|Coffee Shops|null|
> |id2|Coffee Shops|null|
> Physical plan is:
> {code:sql}
> 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) 
> categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = 
> {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293
> 00-01  Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : 
> rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292
> 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) 
> : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291
> 00-03  Jdbc(sql=[SELECT *
> FROM "public"."transactions"
> INNER JOIN (SELECT *
> FROM "public"."categories"
> WHERE "categoryparentguid" IS NOT NULL) AS "t" ON 
> "transactions"."categoryguid" = "t"."categoryguid"
> INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" 
> = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) 
> transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) 
> transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) 
> transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, 
> VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, 
> VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) 
> transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) 
> transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) 
> transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) 
> transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, 
> VARCHAR(50) transactionorigpartyguid, VARCHAR(255) 
> transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) 
> transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, 
> 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) 
> transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) 
> transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) 
> transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, 
> VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, 
> TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) 
> transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, 
> VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) 
> categoryguid0, VARCHAR(50) categoryparentguid, DECIMAL(3, 0) categorytype, 
> VARCHAR(50) categoryname, VARCHAR(50) categorydescription, VARCHAR(50) 
> partyguid, VARCHAR(50) categoryguid1, VARCHAR(50) categoryparentguid0, 
> DECIMAL(3, 0) categorytype0, VARCHAR(50) categoryname0, VARCHAR(50) 
> 

[jira] [Updated] (DRILL-5710) drill-config.sh incorrectly exits with Java 1.7 or later is required to run Apache Drill

2017-08-08 Thread Angel Aray (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Angel Aray updated DRILL-5710:
--
Summary: drill-config.sh incorrectly exits with Java 1.7 or later is 
required to run Apache Drill  (was: drill-config.sh incorrectly exits with Java 
1.7 or later is required to run Apache Dril)

> drill-config.sh incorrectly exits with Java 1.7 or later is required to run 
> Apache Drill
> 
>
> Key: DRILL-5710
> URL: https://issues.apache.org/jira/browse/DRILL-5710
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
> Environment: java version "1.8.0_144"
> OSX
>Reporter: Angel Aray
>
> drill-config fails to recognize 1.8.0_144 as Java 1.7 or later.
> The scripts validates the java version using the following code:
> "$JAVA" -version 2>&1 | grep "version" | egrep -e "1.4|1.5|1.6" 
> this should be replaced by:
> "$JAVA" -version 2>&1 | grep "version" | egrep -e "1\.4|1\.5|1\.6" 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5710) drill-config.sh incorrectly exits with Java 1.7 or later is required to run Apache Dril

2017-08-08 Thread Angel Aray (JIRA)
Angel Aray created DRILL-5710:
-

 Summary: drill-config.sh incorrectly exits with Java 1.7 or later 
is required to run Apache Dril
 Key: DRILL-5710
 URL: https://issues.apache.org/jira/browse/DRILL-5710
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
 Environment: java version "1.8.0_144"
OSX
Reporter: Angel Aray


drill-config fails to recognize 1.8.0_144 as Java 1.7 or later.

The scripts validates the java version using the following code:
"$JAVA" -version 2>&1 | grep "version" | egrep -e "1.4|1.5|1.6" 

this should be replaced by:

"$JAVA" -version 2>&1 | grep "version" | egrep -e "1\.4|1\.5|1\.6" 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5708) Add DNS decode function for PCAP storage

2017-08-08 Thread Takeo Ogawara (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119286#comment-16119286
 ] 

Takeo Ogawara commented on DRILL-5708:
--

Hello Givre

Thank you for the comment.
Main outputs are following in my mind.
1. Domain name, queried by user
2. Canonical names in response sequences
3. Resolved IP Address
4. TTL 

> Add DNS decode function for PCAP storage
> 
>
> Key: DRILL-5708
> URL: https://issues.apache.org/jira/browse/DRILL-5708
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Takeo Ogawara
>Priority: Minor
>
> As described in DRILL-5432, it is very useful to analyze packet contents and 
> application layer protocols. To improve the PCAP analysis function, it's 
> better to add a function to decode DNS queries and responses. This enables to 
> classify packets by FQDN and display user access trends.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5709) Provide a value vector method to convert a vector to nullable

2017-08-08 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5709:
---
Priority: Minor  (was: Major)

> Provide a value vector method to convert a vector to nullable
> -
>
> Key: DRILL-5709
> URL: https://issues.apache.org/jira/browse/DRILL-5709
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.12.0
>
>
> The hash agg spill work has need to convert a non-null scalar vector to the 
> nullable equivalent. For efficiency, the code wishes to simply transfer the 
> underlying data buffer(s), and create the required "bits" vector, rather than 
> generating code that does the transfer row-by-row.
> The solution is to add a {{toNullable(ValueVector nullableVector)}} method to 
> the {{ValueVector}} class, then implement it where needed.
> Since the target code only works with scalars (that is, no arrays, no maps, 
> no lists), the code only handles these cases, throwing an 
> {{UnsupportedOperationException}} in other cases.
> Usage:
> {code}
> ValueVector nonNullableVector = // your non-nullable vector
> MajorType type = MajorType.newBuilder(nonNullableVector.getType)
> .setMode(DataMode.OPTIONAL)
> .build();
> MaterializedField field = MaterializedField.create(name, type);
> ValueVector nullableVector = TypeHelper.getNewVector(field, 
> oContext.getAllocator());
> nonNullableVector.toNullable(nullableVector);
> // Data is now in nullableVector
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5709) Provide a value vector method to convert a vector to nullable

2017-08-08 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5709:
--

 Summary: Provide a value vector method to convert a vector to 
nullable
 Key: DRILL-5709
 URL: https://issues.apache.org/jira/browse/DRILL-5709
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.12.0


The hash agg spill work has need to convert a non-null scalar vector to the 
nullable equivalent. For efficiency, the code wishes to simply transfer the 
underlying data buffer(s), and create the required "bits" vector, rather than 
generating code that does the transfer row-by-row.

The solution is to add a {{toNullable(ValueVector nullableVector)}} method to 
the {{ValueVector}} class, then implement it where needed.

Since the target code only works with scalars (that is, no arrays, no maps, no 
lists), the code only handles these cases, throwing an 
{{UnsupportedOperationException}} in other cases.

Usage:

{code}
ValueVector nonNullableVector = // your non-nullable vector
MajorType type = MajorType.newBuilder(nonNullableVector.getType)
.setMode(DataMode.OPTIONAL)
.build();
MaterializedField field = MaterializedField.create(name, type);
ValueVector nullableVector = TypeHelper.getNewVector(field, 
oContext.getAllocator());
nonNullableVector.toNullable(nullableVector);
// Data is now in nullableVector
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-4211) Inconsistent results from a joined sql statement to postgres tables

2017-08-08 Thread Chunhui Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-4211:
--

Assignee: (was: Chunhui Shi)

> Inconsistent results from a joined sql statement to postgres tables
> ---
>
> Key: DRILL-4211
> URL: https://issues.apache.org/jira/browse/DRILL-4211
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.3.0
> Environment: Postgres db stroage
>Reporter: Robert Hamilton-Smith
>  Labels: newbie
>
> When making an sql statement that incorporates a join to a table and then a 
> self join to that table to get a parent value , Drill brings back 
> inconsistent results. 
> Here is the sql in postgres with correct output:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from transactions trx
> join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join categories w1 on (cat.categoryparentguid = w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL;
> {code}
> Output:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|food|
> |id1|restaurants|food|
> |id2|Coffee Shops|food|
> |id2|Coffee Shops|food|
> When run in Drill with correct storage prefix:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from db.schema.transactions trx
> join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join db.schema.wpfm_categories w1 on (cat.categoryparentguid = 
> w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL
> {code}
> Results are:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|null|
> |id1|restaurants|null|
> |id2|Coffee Shops|null|
> |id2|Coffee Shops|null|
> Physical plan is:
> {code:sql}
> 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) 
> categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = 
> {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293
> 00-01  Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : 
> rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292
> 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) 
> : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291
> 00-03  Jdbc(sql=[SELECT *
> FROM "public"."transactions"
> INNER JOIN (SELECT *
> FROM "public"."categories"
> WHERE "categoryparentguid" IS NOT NULL) AS "t" ON 
> "transactions"."categoryguid" = "t"."categoryguid"
> INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" 
> = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) 
> transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) 
> transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) 
> transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, 
> VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, 
> VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) 
> transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) 
> transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) 
> transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) 
> transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, 
> VARCHAR(50) transactionorigpartyguid, VARCHAR(255) 
> transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) 
> transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, 
> 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) 
> transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) 
> transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) 
> transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, 
> VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, 
> TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) 
> transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, 
> VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) 
> categoryguid0, VARCHAR(50) categoryparentguid, DECIMAL(3, 0) categorytype, 
> VARCHAR(50) categoryname, VARCHAR(50) categorydescription, VARCHAR(50) 
> partyguid, VARCHAR(50) categoryguid1, VARCHAR(50) categoryparentguid0, 
> DECIMAL(3, 0) categorytype0, VARCHAR(50) categoryname0, VARCHAR(50) 
> categorydescription0, VARCHAR(50) 

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119212#comment-16119212
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/866#discussion_r131216670
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/ColumnLoaderImpl.java
 ---
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet.impl;
+
+import org.apache.drill.exec.physical.rowSet.ColumnLoader;
+
+/**
+ * Implementation interface for a column loader. Adds to the public 
interface
+ * a number of methods needed to coordinate batch overflow.
+ */
+
+public interface ColumnLoaderImpl extends ColumnLoader {
--- End diff --

"Impl"  in an interface name sounds odd.


> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119211#comment-16119211
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/866#discussion_r131564509
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/ResultVectorCache.java
 ---
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet.impl;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.ValueVector;
+
+/**
+ * Manages an inventory of value vectors used across row batch readers.
+ * Drill semantics for batches is complex. Each operator logically returns
+ * a batch of records on each call of the Drill Volcano iterator protocol
+ * next() operation. However, the batches "returned" are not
+ * separate objects. Instead, Drill enforces the following semantics:
+ * 
+ * If a next() call returns OK then the set of 
vectors
+ * in the "returned" batch must be identical to those in the prior batch. 
Not
+ * just the same type; they must be the same ValueVector objects.
+ * (The buffers within the vectors will be different.)
+ * If the set of vectors changes in any way (add a vector, remove a
+ * vector, change the type of a vector), then the next() call
+ * must return OK_NEW_SCHEMA.
+ * 
+ * These rules create interesting constraints for the scan operator.
+ * Conceptually, each batch is distinct. But, it must share vectors. The
+ * {@link ResultSetLoader} class handles this by managing the set of 
vectors
+ * used by a single reader.
+ * 
+ * Readers are independent: each may read a distinct schema (as in JSON.)
+ * Yet, the Drill protocol requires minimizing spurious 
OK_NEW_SCHEMA
+ * events. As a result, two readers run by the same scan operator must
+ * share the same set of vectors, despite the fact that they may have
+ * different schemas and thus different ResultSetLoaders.
+ * 
+ * The purpose of this inventory is to persist vectors across readers, even
+ * when, say, reader B does not use a vector that reader A created.
+ * 
+ * The semantics supported by this class include:
+ * 
+ * Ability to "pre-declare" columns based on columns that appear in
+ * an explicit select list. This ensures that the columns are known (but
+ * not their types).
+ * Ability to reuse a vector across readers if the column retains the 
same
+ * name and type (minor type and mode.)
+ * Ability to flush unused vectors for readers with changing schemas
+ * if a schema change occurs.
+ * Support schema "hysteresis"; that is, the a "sticky" schema that
+ * minimizes spurious changes. Once a vector is declared, it can be 
included
+ * in all subsequent batches (provided the column is nullable or an 
array.)
+ * 
+ */
+public class ResultVectorCache {
+
+  /**
+   * State of a projected vector. At first all we have is a name.
+   * Later, we'll discover the type.
+   */
+
+  private static class VectorState {
+protected final String name;
+protected ValueVector vector;
+protected boolean touched;
+
+public VectorState(String name) {
+  this.name = name;
+}
+
+public boolean satisfies(MaterializedField colSchema) {
+  if (vector == null) {
+return false;
+  }
+  MaterializedField vectorSchema = vector.getField();
+  return vectorSchema.getType().equals(colSchema.getType());
+

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119216#comment-16119216
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/866#discussion_r130250994
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/ResultSetLoader.java
 ---
@@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet;
+
+import org.apache.drill.exec.record.VectorContainer;
+
+/**
+ * Builds a result set (series of zero or more row sets) based on a defined
+ * schema which may
+ * evolve (expand) over time. Automatically rolls "overflow" rows over
+ * when a batch fills.
+ * 
+ * Many of the methods in this interface are verify that the loader is
--- End diff --

"to verify"


> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119210#comment-16119210
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/866#discussion_r130429208
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/TupleSchema.java
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet;
+
+import org.apache.drill.exec.physical.rowSet.impl.MaterializedSchema;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * Defines the schema of a tuple: either the top-level row or a nested
+ * "map" (really structure). A schema is a collection of columns (backed
+ * by vectors in the loader itself.) Columns are accessible by name or
+ * index. New columns may be added at any time; the new column takes the
+ * next available index.
+ */
+
+public interface TupleSchema {
+
+  public interface TupleColumnSchema {
+MaterializedField schema();
+
+/**
+ * Report if a column is selected.
+ * @param colIndex index of the column to check
+ * @return true if the column is selected (data is collected),
+ * false if the column is unselected (data is discarded)
+ */
+
+boolean isSelected();
--- End diff --

What does it mean for a  column to be selected? Selected in the query?


> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119214#comment-16119214
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/866#discussion_r131284158
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/LogicalTupleLoader.java
 ---
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet.impl;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.drill.exec.physical.rowSet.ColumnLoader;
+import org.apache.drill.exec.physical.rowSet.TupleLoader;
+import org.apache.drill.exec.physical.rowSet.TupleSchema;
+import org.apache.drill.exec.physical.rowSet.TupleSchema.TupleColumnSchema;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * Shim inserted between an actual tuple loader and the client to remove 
columns
+ * that are not projected from input to output. The underlying loader 
handles only
+ * the projected columns in order to improve efficiency. This class 
presents the
+ * full table schema, but returns null for the non-projected columns. This 
allows
+ * the reader to work with the table schema as defined by the data source, 
but
+ * skip those columns which are not projected. Skipping non-projected 
columns avoids
+ * creating value vectors which are immediately discarded. It may also 
save the reader
+ * from reading unwanted data.
+ */
+public class LogicalTupleLoader implements TupleLoader {
+
+  public static final int UNMAPPED = -1;
+
+  private static class MappedColumn implements TupleColumnSchema {
+
+private final MaterializedField schema;
+private final int mapping;
+
+public MappedColumn(MaterializedField schema, int mapping) {
+  this.schema = schema;
+  this.mapping = mapping;
+}
+
+@Override
+public MaterializedField schema() { return schema; }
+
+@Override
+public boolean isSelected() { return mapping != UNMAPPED; }
+
+@Override
+public int vectorIndex() { return mapping; }
+  }
+
+  /**
+   * Implementation of the tuple schema that describes the full data source
+   * schema. The underlying loader schema is a subset of these columns. 
Note
+   * that the columns appear in the same order in both schemas, but the 
loader
+   * schema is a subset of the table schema.
+   */
+
+  private class LogicalTupleSchema implements TupleSchema {
+
+private final Set selection = new HashSet<>();
+private final TupleSchema physicalSchema;
+
+private LogicalTupleSchema(TupleSchema physicalSchema, 
Collection selection) {
+  this.physicalSchema = physicalSchema;
+  this.selection.addAll(selection);
+}
+
+@Override
+public int columnCount() { return logicalSchema.count(); }
+
+@Override
+public int columnIndex(String colName) {
+  return logicalSchema.indexOf(rsLoader.toKey(colName));
+}
+
+@Override
+public TupleColumnSchema metadata(int colIndex) { return 
logicalSchema.get(colIndex); }
+
+@Override
+public MaterializedField column(int colIndex) { return 
logicalSchema.get(colIndex).schema(); }
+
+@Override
+public TupleColumnSchema metadata(String colName) { return 
logicalSchema.get(colName); }
+
+@Override
+public MaterializedField column(String colName) { return 
logicalSchema.get(colName).schema(); }
+
+@Override
+public int 

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119213#comment-16119213
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/866#discussion_r131459203
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/ResultSetLoaderImpl.java
 ---
@@ -0,0 +1,412 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet.impl;
+
+import java.util.Collection;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.TupleLoader;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.ValueVector;
+
+/**
+ * Implementation of the result set loader.
+ * @see {@link ResultSetLoader}
+ */
+
+public class ResultSetLoaderImpl implements ResultSetLoader, 
WriterIndexImpl.WriterIndexListener {
+
+  public static class ResultSetOptions {
+public final int vectorSizeLimit;
+public final int rowCountLimit;
+public final boolean caseSensitive;
+public final ResultVectorCache inventory;
--- End diff --

The name 'inventory' does not convey the intent clearly.


> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119217#comment-16119217
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/866#discussion_r131554894
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/ResultSetLoaderImpl.java
 ---
@@ -0,0 +1,412 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet.impl;
+
+import java.util.Collection;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.TupleLoader;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.ValueVector;
+
+/**
+ * Implementation of the result set loader.
+ * @see {@link ResultSetLoader}
+ */
+
+public class ResultSetLoaderImpl implements ResultSetLoader, 
WriterIndexImpl.WriterIndexListener {
+
+  public static class ResultSetOptions {
+public final int vectorSizeLimit;
+public final int rowCountLimit;
+public final boolean caseSensitive;
+public final ResultVectorCache inventory;
+private final Collection selection;
+
+public ResultSetOptions() {
+  vectorSizeLimit = ValueVector.MAX_BUFFER_SIZE;
+  rowCountLimit = ValueVector.MAX_ROW_COUNT;
+  caseSensitive = false;
+  selection = null;
+  inventory = null;
+}
+
+public ResultSetOptions(OptionBuilder builder) {
+  this.vectorSizeLimit = builder.vectorSizeLimit;
+  this.rowCountLimit = builder.rowCountLimit;
+  this.caseSensitive = builder.caseSensitive;
+  this.selection = builder.selection;
+  this.inventory = builder.inventory;
+}
+  }
+
+  public static class OptionBuilder {
+private int vectorSizeLimit;
+private int rowCountLimit;
+private boolean caseSensitive;
+private Collection selection;
+private ResultVectorCache inventory;
+
+public OptionBuilder() {
+  ResultSetOptions options = new ResultSetOptions();
+  vectorSizeLimit = options.vectorSizeLimit;
+  rowCountLimit = options.rowCountLimit;
+  caseSensitive = options.caseSensitive;
+}
+
+public OptionBuilder setCaseSensitive(boolean flag) {
+  caseSensitive = flag;
+  return this;
+}
+
+public OptionBuilder setRowCountLimit(int limit) {
+  rowCountLimit = Math.min(limit, ValueVector.MAX_ROW_COUNT);
+  return this;
+}
+
+public OptionBuilder setSelection(Collection selection) {
+  this.selection = selection;
+  return this;
+}
+
+public OptionBuilder setVectorCache(ResultVectorCache inventory) {
+  this.inventory = inventory;
+  return this;
+}
+
+// TODO: No setter for vector length yet: is hard-coded
+// at present in the value vector.
+
+public ResultSetOptions build() {
+  return new ResultSetOptions(this);
+}
+  }
+
+  public static class VectorContainerBuilder {
+private final ResultSetLoaderImpl rowSetMutator;
+private int lastUpdateVersion = -1;
+private VectorContainer container;
+
+public VectorContainerBuilder(ResultSetLoaderImpl rowSetMutator) {
+  this.rowSetMutator = rowSetMutator;
+  container = new VectorContainer(rowSetMutator.allocator);
+}
+
+public void update() {
+  if (lastUpdateVersion < rowSetMutator.schemaVersion()) {
+rowSetMutator.rootTuple.buildContainer(this);
+

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119209#comment-16119209
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/866#discussion_r131685349
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/TupleSetImpl.java
 ---
@@ -0,0 +1,551 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.physical.rowSet.ColumnLoader;
+import org.apache.drill.exec.physical.rowSet.TupleLoader;
+import org.apache.drill.exec.physical.rowSet.TupleSchema;
+import 
org.apache.drill.exec.physical.rowSet.impl.ResultSetLoaderImpl.VectorContainerBuilder;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.VectorOverflowException;
+import org.apache.drill.exec.vector.accessor.impl.AbstractColumnWriter;
+import org.apache.drill.exec.vector.accessor.impl.ColumnAccessorFactory;
+
+/**
+ * Implementation of a column when creating a row batch.
+ * Every column resides at an index, is defined by a schema,
+ * is backed by a value vector, and and is written to by a writer.
+ * Each column also tracks the schema version in which it was added
+ * to detect schema evolution. Each column has an optional overflow
+ * vector that holds overflow record values when a batch becomes
+ * full.
+ * 
+ * Overflow vectors require special consideration. The vector class itself
+ * must remain constant as it is bound to the writer. To handle overflow,
+ * the implementation must replace the buffer in the vector with a new
+ * one, saving the full vector to return as part of the final row batch.
+ * This puts the column in one of three states:
+ * 
+ * Normal: only one vector is of concern - the vector for the active
+ * row batch.
+ * Overflow: a write to a vector caused overflow. For all columns,
+ * the data buffer is shifted to a harvested vector, and a new, empty
+ * buffer is put into the active vector.
+ * Excess: a (small) column received values for the row that will
--- End diff --

'Excess' is the LOOK_AHEAD state, correct? I think it would be better if 
the comments use the same terminology as in the code.


> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and 

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119215#comment-16119215
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/866#discussion_r131684173
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/TupleSetImpl.java
 ---
@@ -0,0 +1,551 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.physical.rowSet.ColumnLoader;
+import org.apache.drill.exec.physical.rowSet.TupleLoader;
+import org.apache.drill.exec.physical.rowSet.TupleSchema;
+import 
org.apache.drill.exec.physical.rowSet.impl.ResultSetLoaderImpl.VectorContainerBuilder;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.VectorOverflowException;
+import org.apache.drill.exec.vector.accessor.impl.AbstractColumnWriter;
+import org.apache.drill.exec.vector.accessor.impl.ColumnAccessorFactory;
+
+/**
+ * Implementation of a column when creating a row batch.
+ * Every column resides at an index, is defined by a schema,
+ * is backed by a value vector, and and is written to by a writer.
+ * Each column also tracks the schema version in which it was added
+ * to detect schema evolution. Each column has an optional overflow
+ * vector that holds overflow record values when a batch becomes
+ * full.
+ * 
+ * Overflow vectors require special consideration. The vector class itself
+ * must remain constant as it is bound to the writer. To handle overflow,
+ * the implementation must replace the buffer in the vector with a new
+ * one, saving the full vector to return as part of the final row batch.
+ * This puts the column in one of three states:
+ * 
+ * Normal: only one vector is of concern - the vector for the active
+ * row batch.
+ * Overflow: a write to a vector caused overflow. For all columns,
+ * the data buffer is shifted to a harvested vector, and a new, empty
+ * buffer is put into the active vector.
+ * Excess: a (small) column received values for the row that will
+ * overflow due to a later column. When overflow occurs, the excess
+ * column value, from the overflow record, resides in the active
+ * vector. It must be shifted from the active vector into the new
+ * overflow buffer.
+ */
+
+public class TupleSetImpl implements TupleSchema {
+
+  public static class TupleLoaderImpl implements TupleLoader {
+
+public TupleSetImpl tupleSet;
+
+public TupleLoaderImpl(TupleSetImpl tupleSet) {
+  this.tupleSet = tupleSet;
+}
+
+@Override
+public TupleSchema schema() { return tupleSet; }
+
+@Override
+public ColumnLoader column(int colIndex) {
+  // TODO: Cache loaders here
+  return tupleSet.columnImpl(colIndex).writer;
+}
+
+@Override
+public ColumnLoader column(String colName) {
+  ColumnImpl col = tupleSet.columnImpl(colName);
+  if (col == null) {
+throw new UndefinedColumnException(colName);
+  }
+  return col.writer;
+}
+
+@Override
+public TupleLoader loadRow(Object... values) {
--- End diff --

Is there a need to verify the types of the incoming args?


> Implement size-aware result set loader
> --
>
> Key: 

[jira] [Updated] (DRILL-3091) Cancelled query continues to list on Drill UI with CANCELLATION_REQUESTED state

2017-08-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3091:
-
Reviewer: Khurram Faraaz

> Cancelled query continues to list on Drill UI with CANCELLATION_REQUESTED 
> state
> ---
>
> Key: DRILL-3091
> URL: https://issues.apache.org/jira/browse/DRILL-3091
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.0.0
>Reporter: Abhishek Girish
> Fix For: Future
>
> Attachments: drillbit.log
>
>
> A long running query (TPC-DS SF 100 - query 2) continues to be listed on the 
> Drill UI query profile page, among the list of running queries. It's been 
> more than 30 minutes as of this report. 
> TOP -p  showed no activity after the cancellation. And 
> Jstack on all nodes did not contain the queryID. 
> I can share more details for repro. 
> Git.Commit.ID: 583ca4a (May 14 build)
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-4211) Inconsistent results from a joined sql statement to postgres tables

2017-08-08 Thread Chunhui Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-4211:
--

Assignee: Chunhui Shi

> Inconsistent results from a joined sql statement to postgres tables
> ---
>
> Key: DRILL-4211
> URL: https://issues.apache.org/jira/browse/DRILL-4211
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.3.0
> Environment: Postgres db stroage
>Reporter: Robert Hamilton-Smith
>Assignee: Chunhui Shi
>  Labels: newbie
>
> When making an sql statement that incorporates a join to a table and then a 
> self join to that table to get a parent value , Drill brings back 
> inconsistent results. 
> Here is the sql in postgres with correct output:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from transactions trx
> join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join categories w1 on (cat.categoryparentguid = w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL;
> {code}
> Output:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|food|
> |id1|restaurants|food|
> |id2|Coffee Shops|food|
> |id2|Coffee Shops|food|
> When run in Drill with correct storage prefix:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from db.schema.transactions trx
> join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join db.schema.wpfm_categories w1 on (cat.categoryparentguid = 
> w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL
> {code}
> Results are:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|null|
> |id1|restaurants|null|
> |id2|Coffee Shops|null|
> |id2|Coffee Shops|null|
> Physical plan is:
> {code:sql}
> 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) 
> categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = 
> {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293
> 00-01  Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : 
> rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292
> 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) 
> : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291
> 00-03  Jdbc(sql=[SELECT *
> FROM "public"."transactions"
> INNER JOIN (SELECT *
> FROM "public"."categories"
> WHERE "categoryparentguid" IS NOT NULL) AS "t" ON 
> "transactions"."categoryguid" = "t"."categoryguid"
> INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" 
> = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) 
> transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) 
> transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) 
> transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, 
> VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, 
> VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) 
> transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) 
> transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) 
> transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) 
> transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, 
> VARCHAR(50) transactionorigpartyguid, VARCHAR(255) 
> transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) 
> transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, 
> 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) 
> transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) 
> transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) 
> transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, 
> VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, 
> TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) 
> transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, 
> VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) 
> categoryguid0, VARCHAR(50) categoryparentguid, DECIMAL(3, 0) categorytype, 
> VARCHAR(50) categoryname, VARCHAR(50) categorydescription, VARCHAR(50) 
> partyguid, VARCHAR(50) categoryguid1, VARCHAR(50) categoryparentguid0, 
> DECIMAL(3, 0) categorytype0, VARCHAR(50) categoryname0, VARCHAR(50) 
> categorydescription0, 

[jira] [Commented] (DRILL-4211) Inconsistent results from a joined sql statement to postgres tables

2017-08-08 Thread Timothy Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119129#comment-16119129
 ] 

Timothy Farkas commented on DRILL-4211:
---

I'll take a look at this bug and try to reproduce it to see if it's still an 
issue.


> Inconsistent results from a joined sql statement to postgres tables
> ---
>
> Key: DRILL-4211
> URL: https://issues.apache.org/jira/browse/DRILL-4211
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.3.0
> Environment: Postgres db stroage
>Reporter: Robert Hamilton-Smith
>  Labels: newbie
>
> When making an sql statement that incorporates a join to a table and then a 
> self join to that table to get a parent value , Drill brings back 
> inconsistent results. 
> Here is the sql in postgres with correct output:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from transactions trx
> join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join categories w1 on (cat.categoryparentguid = w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL;
> {code}
> Output:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|food|
> |id1|restaurants|food|
> |id2|Coffee Shops|food|
> |id2|Coffee Shops|food|
> When run in Drill with correct storage prefix:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from db.schema.transactions trx
> join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join db.schema.wpfm_categories w1 on (cat.categoryparentguid = 
> w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL
> {code}
> Results are:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|null|
> |id1|restaurants|null|
> |id2|Coffee Shops|null|
> |id2|Coffee Shops|null|
> Physical plan is:
> {code:sql}
> 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) 
> categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = 
> {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293
> 00-01  Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : 
> rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292
> 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) 
> : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291
> 00-03  Jdbc(sql=[SELECT *
> FROM "public"."transactions"
> INNER JOIN (SELECT *
> FROM "public"."categories"
> WHERE "categoryparentguid" IS NOT NULL) AS "t" ON 
> "transactions"."categoryguid" = "t"."categoryguid"
> INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" 
> = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) 
> transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) 
> transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) 
> transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, 
> VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, 
> VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) 
> transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) 
> transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) 
> transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) 
> transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, 
> VARCHAR(50) transactionorigpartyguid, VARCHAR(255) 
> transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) 
> transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, 
> 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) 
> transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) 
> transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) 
> transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, 
> VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, 
> TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) 
> transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, 
> VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) 
> categoryguid0, VARCHAR(50) categoryparentguid, DECIMAL(3, 0) categorytype, 
> VARCHAR(50) categoryname, VARCHAR(50) categorydescription, VARCHAR(50) 
> partyguid, VARCHAR(50) categoryguid1, VARCHAR(50) categoryparentguid0, 
> DECIMAL(3, 0) 

[jira] [Commented] (DRILL-5165) wrong results - LIMIT ALL and OFFSET clause in same query

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118737#comment-16118737
 ] 

ASF GitHub Bot commented on DRILL-5165:
---

Github user chunhui-shi closed the pull request at:

https://github.com/apache/drill/pull/776


> wrong results - LIMIT ALL and OFFSET clause in same query
> -
>
> Key: DRILL-5165
> URL: https://issues.apache.org/jira/browse/DRILL-5165
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> This issue was reported by a user on Drill's user list.
> Drill 1.10.0 commit ID : bbcf4b76
> I tried a similar query on apache Drill 1.10.0 and Drill returns wrong 
> results when compared to Postgres, for a query that uses LIMIT ALL and OFFSET 
> clause in the same query. We need to file a JIRA to track this issue.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by 1 limit 
> all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.211 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by col_int 
> limit all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.24 seconds)
> {noformat}
> Query => select col_int from typeall_l limit all offset 10;
> Drill 1.10.0 returns 85 rows
> whereas for same query,
> postgres=# select col_int from typeall_l limit all offset 10;
> Postgres 9.3 returns 95 rows, which is the correct expected result.
> Query plan for above query that returns wrong results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select col_int from typeall_l 
> limit all offset 10;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(col_int=[$0])
> 00-02SelectionVectorRemover
> 00-03  Limit(offset=[10])
> 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///tmp/typeall_l]], selectionRoot=maprfs:/tmp/typeall_l, 
> numFiles=1, usedMetadataFile=false, columns=[`col_int`]]])
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-1162) 25 way join ended up with OOM

2017-08-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118636#comment-16118636
 ] 

Paul Rogers commented on DRILL-1162:


The challenge here is that we are looking for a quick bug fix solution to a 
problem which appears to be quite difficult. So, there is no good answer.

On the one hand, it seems that the planner cannot correctly predict the size of 
the two sides of the join. I am leery of a simple solution that just flips the 
sides. That is likely to fix the current issue, but break many queries that now 
work. This is true because, here, we are only worried about this one query. 
But, Drill has to handle all queries, even those that this ticket is not 
concerned with. Flipping the sides is likely to cause regressions, which will 
cause us to back out the change, and put us back where we started.

Is there a principled way to make the decision to flip? Perhaps based on the 
analysis above about the effect of cascaded joins?

Second, none of this addresses the real issue: that the hash join operator uses 
too much memory (heap in one scenario, direct in another.) There is no analysis 
of why we exhaust each resource, so it is not possible to identify if any 
particular hack is likely to solve the issue.

If direct memory exhaustion is caused by excessive hash join table size, then 
spill-to-disk may solve it. But, if we are building large tables unnecessarily 
(we've seen this in other cases), then smarter planning rules, or better 
run-time adjustment, may be needed.

If the heap is exhausted, then we have no understanding of why that should be 
so. What is using heap? The hash tables use direct memory. Do we understand why 
heap was exhausted?

Without understanding these fundamentals, we are only hacking and, IMHO, one 
hack is as good as another; they are just random shots in the dark. If we have 
very limited time to fix a deep issue, then hacking is all we can do, of course.

> 25 way join ended up with OOM
> -
>
> Key: DRILL-1162
> URL: https://issues.apache.org/jira/browse/DRILL-1162
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Query Planning & Optimization
>Reporter: Rahul Challapalli
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Fix For: Future
>
> Attachments: error.log, oom_error.log
>
>
> git.commit.id.abbrev=e5c2da0
> The below query results in 0 results being returned 
> {code:sql}
> select count(*) from `lineitem1.parquet` a 
> inner join `part.parquet` j on a.l_partkey = j.p_partkey 
> inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey 
> inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey 
> inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey 
> = m.ps_suppkey 
> inner join `customer.parquet` n on k.o_custkey = n.c_custkey 
> inner join `lineitem2.parquet` b on a.l_orderkey = b.l_orderkey 
> inner join `lineitem2.parquet` c on a.l_partkey = c.l_partkey 
> inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey 
> inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice 
> inner join `lineitem2.parquet` f on a.l_comment = f.l_comment 
> inner join `lineitem2.parquet` g on a.l_shipdate = g.l_shipdate 
> inner join `lineitem2.parquet` h on a.l_commitdate = h.l_commitdate 
> inner join `lineitem2.parquet` i on a.l_receiptdate = i.l_receiptdate 
> inner join `lineitem2.parquet` o on a.l_receiptdate = o.l_receiptdate 
> inner join `lineitem2.parquet` p on a.l_receiptdate = p.l_receiptdate 
> inner join `lineitem2.parquet` q on a.l_receiptdate = q.l_receiptdate 
> inner join `lineitem2.parquet` r on a.l_receiptdate = r.l_receiptdate 
> inner join `lineitem2.parquet` s on a.l_receiptdate = s.l_receiptdate 
> inner join `lineitem2.parquet` t on a.l_receiptdate = t.l_receiptdate 
> inner join `lineitem2.parquet` u on a.l_receiptdate = u.l_receiptdate 
> inner join `lineitem2.parquet` v on a.l_receiptdate = v.l_receiptdate 
> inner join `lineitem2.parquet` w on a.l_receiptdate = w.l_receiptdate 
> inner join `lineitem2.parquet` x on a.l_receiptdate = x.l_receiptdate;
> {code}
> However when we remove the last 'inner join' and run the query it returns 
> '716372534'. Since the last inner join is similar to the one's before it, it 
> should match some records and return the data appropriately.
> The logs indicated that it actually returned 0 results. Attached the log file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118443#comment-16118443
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user weijietong commented on the issue:

https://github.com/apache/drill/pull/889
  
@arina-ielchiieva  your test case can not reproduce the error . You can 
search the dev email to find the origin error description with the keyword 
"Drill query planning error".  Your query already satisfy the 
NestedLoopJoinPrule. My case is that I add another rule to change the 
Aggregate-->Aggregate-->Scan to Scan as the transformed Scan relnode already 
holding the count(distinct ) value. When this transformation occurs, the 
NestedLoopJoinPrule's checkPreconditions method will invoke 
JoinUtils.hasScalarSubqueryInput. Then it will fail, as the transformed relnode 
has no aggregate node which does not satisfy the current scalar rule. 

I think it's hard to reproduce this error without a specific rule like what 
I do. the precondition is:
1. a nested loop join
2. no (aggregate--> aggregate) count distinct relation nodes in the plan
3. the row number of one child of the nested loop join is 1 .

I wonder if the enhanced code does not break the current unit test ,it will 
be ok.





> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-1051) casting timestamp as date gives wrong result for dates earlier than 1797

2017-08-08 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-1051:
--

Assignee: Vitalii Diravka

> casting timestamp as date gives wrong result for dates earlier than 1797
> 
>
> Key: DRILL-1051
> URL: https://issues.apache.org/jira/browse/DRILL-1051
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: Future
>
>
> #Wed Jun 18 10:27:23 PDT 2014
> git.commit.id.abbrev=894037a
> It appears casting dates earlier than year 1797 gives wrong result:
> 0: jdbc:drill:schema=dfs> select cast(c_timestamp as varchar(20)), 
> cast(c_timestamp as date) from data where c_row <> 12;
> +++
> |   EXPR$0   |   EXPR$1   |
> +++
> | 1997-01-02 03:04:05 | 1997-01-02 |
> | 1997-01-02 00:00:00 | 1997-01-02 |
> | 2001-09-22 18:19:20 | 2001-09-22 |
> | 1997-02-10 17:32:01 | 1997-02-10 |
> | 1997-02-10 17:32:00 | 1997-02-10 |
> | 1997-02-11 17:32:01 | 1997-02-11 |
> | 1997-02-12 17:32:01 | 1997-02-12 |
> | 1997-02-13 17:32:01 | 1997-02-13 |
> | 1997-02-14 17:32:01 | 1997-02-14 |
> | 1997-02-15 17:32:01 | 1997-02-15 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 0097-02-16 17:32:01 | 0097-02-17 |
> | 0597-02-16 17:32:01 | 0597-02-13 |
> | 1097-02-16 17:32:01 | 1097-02-09 |
> | 1697-02-16 17:32:01 | 1697-02-15 |
> | 1797-02-16 17:32:01 | 1797-02-15 |
> | 1897-02-16 17:32:01 | 1897-02-16 |
> | 1997-02-16 17:32:01 | 1997-02-16 |
> | 2097-02-16 17:32:01 | 2097-02-16 |
> | 1996-02-28 17:32:01 | 1996-02-28 |
> | 1996-02-29 17:32:01 | 1996-02-29 |
> | 1996-03-01 17:32:01 | 1996-03-01 |
> +++
> 22 rows selected (0.201 seconds)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5377) Drill returns weird characters when parquet date auto-correction is turned off

2017-08-08 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-5377:
--

Assignee: Vitalii Diravka

> Drill returns weird characters when parquet date auto-correction is turned off
> --
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>
> git.commit.id.abbrev=38ef562
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5708) Add DNS decode function for PCAP storage

2017-08-08 Thread Charles Givre (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118325#comment-16118325
 ] 

Charles Givre commented on DRILL-5708:
--

What did you have in mind for the function output?

> Add DNS decode function for PCAP storage
> 
>
> Key: DRILL-5708
> URL: https://issues.apache.org/jira/browse/DRILL-5708
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Takeo Ogawara
>Priority: Minor
>
> As described in DRILL-5432, it is very useful to analyze packet contents and 
> application layer protocols. To improve the PCAP analysis function, it's 
> better to add a function to decode DNS queries and responses. This enables to 
> classify packets by FQDN and display user access trends.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5660) Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118183#comment-16118183
 ] 

ASF GitHub Bot commented on DRILL-5660:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/877
  
@arina-ielchiieva Small fix was made to resolve some regression tests 
failings. The branch is rebased to the master version.


> Drill 1.10 queries fail due to Parquet Metadata "corruption" from DRILL-3867
> 
>
> Key: DRILL-5660
> URL: https://issues.apache.org/jira/browse/DRILL-5660
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.12.0
>
>
> Drill recently accepted a PR for the following bug:
> DRILL-3867: Store relative paths in metadata file
> This PR turned out to have a flaw which affects version compatibility.
> The DRILL-3867 PR changed the format of Parquet metadata files to store 
> relative paths. All Drill servers after that PR create files with relative 
> paths. But, the version number of the file is unchanged, so that older 
> Drillbits don't know that they can't understand the file.
> Instead, if an older Drill tries to read the file, queries fail left and 
> right. Drill will resolve the paths, but does so relative to the user's HDFS 
> home directory, which is wrong.
> What should have happened is that we should have bumped the parquet metadata 
> file version number so that older Drillbits can’t read the file. This ticket 
> requests that we do that.
> Now, one could argue that the lack of version number change is fine. Once a 
> user upgrades Drill, they won't use an old Drillbit. But, things are not that 
> simple:
> * A developer tests a branch based on a pre-DRILL-3867 build on a cluster in 
> which metadata files have been created by a post-DRILL-3867 build. (This has 
> already occurred multiple times in our shop.)
> * A user tries to upgrade to Drill 1.11, finds an issue, and needs to roll 
> back to Drill 1.10. Doing so will cause queries to fail due to 
> seemingly-corrupt metadata files.
> * A user tries to do a rolling upgrade: running 1.11 on some servers, 1.10 on 
> others. Once a 1.11 server is installed, the metadata is updated ("corrupted" 
> from the perspective of 1.10) and queries fail.
> Standard practice in this scenario is to:
> * Bump the file version number when the file format changes, and
> * Software refuses to read files with a version newer than the software was 
> designed for.
> Of course, it is highly desirable that newer servers read old files, but that 
> is not the issue here.
> *Main technical points of working of parquet metadata caching for now.*
> Only process of reading the parquet metadata is changed (the process of 
> writing isn't changed):
> +1. Metadata files are valid:+
> Metadata objects are created by deserialization of parquet metadata files in 
> the process of creating ParquetGroupScan physical operator. 
> All supported versions are stored in the "MetadataVersion.Constants" class 
> and in the Jackson annotations for Metadata.ParquetTableMetadataBase class.
> +2. Metadata files version isn't supported (created by newer Drill version). 
> Drill table has at least one metadata file of unsupported version:+
> JsonMappingException is obtained and swallowed without creating metadata 
> object. Error message is logged. The state is stored in MetadataContext, 
> therefore further there will be no attempt to deserialize metadata file again 
> in context of performing current query. The physical plan will be created 
> without using parquet metadata caching. Warning message is logged for every 
> further check "is metadata corrupted".
> +3. Drill table has at least one corrupted metadata file, which can't be 
> deserialized:+
> JsonParseException is obtained. Then the same behaviour as for the 
> unsupported version files.
> +4. The metadata file was removed by other process:+
> FileNotFound is obtained. Then the same behaviour as for the unsupported 
> version files.
> The new versions of metadata should be added in such manner:
> 1. Increasing of the metadata major version if metadata structure is changed.
> 2. Increasing of the metadata minor version if only metadata content is 
> changed, but metadata structure is the same.
> For the first case a new metadata structure (class) should be created 
> (possible an improvement of deserializing metadata files of any version into 
> one strucure by using special converting)
> For the second case only annotation for the last metadata structure can be 
> updated.
> *Summary*
> 1. Drill will read and use metadata files if files are valid, all present and 
> supported. Under supported we 

[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118148#comment-16118148
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/889
  
@weijietong regarding the unit test, I have tried to reproduce the problem 
and written the following unit test:
```
  @Test
  public void test() throws Exception {
FileSystem fs = null;
try {
  fs = FileSystem.get(new Configuration());

  // create table with partition pruning
  test("use dfs_test.tmp");
  String tableName = "table_with_pruning";
  Path dataFile = new 
Path(TestTools.getWorkingPath(),"src/test/resources/parquet/alltypes_required.parquet");
  test("create table %s partition by (col_int) as select * from 
dfs.`%s`", tableName, dataFile);

  // generate metadata
  test("refresh table metadata `%s`", tableName);
  
  // execute query
  String query = String.format("select count(distinct col_int), 
count(distinct col_chr) from `%s` where col_int = 45436", tableName);
  test(query);

} finally {
  if (fs != null) {
fs.close();
  }
}
  }
```
`AbstractGroupScan.getScanStats` method returns one row but it does not 
fail. Can you please take a look?


> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118139#comment-16118139
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/889#discussion_r131871360
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 ---
@@ -231,6 +236,12 @@ public static boolean isScalarSubquery(RelNode root) {
 return true;
   }
 }
+if(!hasMoreInputs && currentRel!=null){
+  double rowSize = 
RelMetadataQuery.instance().getMaxRowCount(currentRel);
+  if(rowSize==1){
--- End diff --

 Please add spaces: `if (rowSize == 1) {`


> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118142#comment-16118142
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/889#discussion_r131871578
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java
 ---
@@ -325,4 +325,5 @@ public void testNlJoinWithLargeRightInputSuccess() 
throws Exception {
   test(RESET_JOIN_OPTIMIZATION);
 }
   }
+
--- End diff --

Please revert changes in this file.


> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118141#comment-16118141
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/889#discussion_r131870970
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 ---
@@ -204,24 +205,28 @@ public static void 
addLeastRestrictiveCasts(LogicalExpression[] leftExpressions,
 
   /**
* Utility method to check if a subquery (represented by its root 
RelNode) is provably scalar. Currently
-   * only aggregates with no group-by are considered scalar. In the 
future, this method should be generalized
-   * to include more cases and reconciled with Calcite's notion of scalar.
+   * only aggregates with no group-by and sub input rel with one row are 
considered scalar. In the future,
+   * this method should be generalized to include more cases and 
reconciled with Calcite's notion of scalar.
* @param root The root RelNode to be examined
* @return True if the root rel or its descendant is scalar, False 
otherwise
*/
   public static boolean isScalarSubquery(RelNode root) {
 DrillAggregateRel agg = null;
-RelNode currentrel = root;
-while (agg == null && currentrel != null) {
-  if (currentrel instanceof DrillAggregateRel) {
-agg = (DrillAggregateRel)currentrel;
-  } else if (currentrel instanceof RelSubset) {
-currentrel = ((RelSubset)currentrel).getBest() ;
-  } else if (currentrel.getInputs().size() == 1) {
+RelNode currentRel = root;
+boolean hasMoreInputs = false;
+while (agg == null && currentRel != null) {
+  if (currentRel instanceof DrillAggregateRel) {
+agg = (DrillAggregateRel)currentRel;
+  } else if (currentRel instanceof RelSubset) {
+currentRel = ((RelSubset)currentRel).getBest() ;
--- End diff --

`currentRel = ((RelSubset) currentRel).getBest() ;`


> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118138#comment-16118138
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/889#discussion_r131871224
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 ---
@@ -204,24 +205,28 @@ public static void 
addLeastRestrictiveCasts(LogicalExpression[] leftExpressions,
 
   /**
* Utility method to check if a subquery (represented by its root 
RelNode) is provably scalar. Currently
-   * only aggregates with no group-by are considered scalar. In the 
future, this method should be generalized
-   * to include more cases and reconciled with Calcite's notion of scalar.
+   * only aggregates with no group-by and sub input rel with one row are 
considered scalar. In the future,
+   * this method should be generalized to include more cases and 
reconciled with Calcite's notion of scalar.
* @param root The root RelNode to be examined
* @return True if the root rel or its descendant is scalar, False 
otherwise
*/
   public static boolean isScalarSubquery(RelNode root) {
 DrillAggregateRel agg = null;
-RelNode currentrel = root;
-while (agg == null && currentrel != null) {
-  if (currentrel instanceof DrillAggregateRel) {
-agg = (DrillAggregateRel)currentrel;
-  } else if (currentrel instanceof RelSubset) {
-currentrel = ((RelSubset)currentrel).getBest() ;
-  } else if (currentrel.getInputs().size() == 1) {
+RelNode currentRel = root;
+boolean hasMoreInputs = false;
+while (agg == null && currentRel != null) {
+  if (currentRel instanceof DrillAggregateRel) {
+agg = (DrillAggregateRel)currentRel;
+  } else if (currentRel instanceof RelSubset) {
+currentRel = ((RelSubset)currentRel).getBest() ;
+  } else if (currentRel.getInputs().size() == 1) {
 // If the rel is not an aggregate or RelSubset, but is a 
single-input rel (could be Project,
 // Filter, Sort etc.), check its input
-currentrel = currentrel.getInput(0);
+currentRel = currentRel.getInput(0);
   } else {
+if(currentRel.getInputs().size()>1){
+  hasMoreInputs=true;
--- End diff --

Please add spaces: `hasMoreInputs = true;`


> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118143#comment-16118143
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/889#discussion_r131871292
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 ---
@@ -231,6 +236,12 @@ public static boolean isScalarSubquery(RelNode root) {
 return true;
   }
 }
+if(!hasMoreInputs && currentRel!=null){
--- End diff --

Please add spaces: `if (!hasMoreInputs && currentRel != null) {`


> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118137#comment-16118137
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/889#discussion_r131870863
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 ---
@@ -204,24 +205,28 @@ public static void 
addLeastRestrictiveCasts(LogicalExpression[] leftExpressions,
 
   /**
* Utility method to check if a subquery (represented by its root 
RelNode) is provably scalar. Currently
-   * only aggregates with no group-by are considered scalar. In the 
future, this method should be generalized
-   * to include more cases and reconciled with Calcite's notion of scalar.
+   * only aggregates with no group-by and sub input rel with one row are 
considered scalar. In the future,
+   * this method should be generalized to include more cases and 
reconciled with Calcite's notion of scalar.
* @param root The root RelNode to be examined
* @return True if the root rel or its descendant is scalar, False 
otherwise
*/
   public static boolean isScalarSubquery(RelNode root) {
 DrillAggregateRel agg = null;
-RelNode currentrel = root;
-while (agg == null && currentrel != null) {
-  if (currentrel instanceof DrillAggregateRel) {
-agg = (DrillAggregateRel)currentrel;
-  } else if (currentrel instanceof RelSubset) {
-currentrel = ((RelSubset)currentrel).getBest() ;
-  } else if (currentrel.getInputs().size() == 1) {
+RelNode currentRel = root;
+boolean hasMoreInputs = false;
+while (agg == null && currentRel != null) {
+  if (currentRel instanceof DrillAggregateRel) {
+agg = (DrillAggregateRel)currentRel;
--- End diff --

` agg = (DrillAggregateRel) currentRel;`


> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118140#comment-16118140
 ] 

ASF GitHub Bot commented on DRILL-5691:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/889#discussion_r131871049
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 ---
@@ -204,24 +205,28 @@ public static void 
addLeastRestrictiveCasts(LogicalExpression[] leftExpressions,
 
   /**
* Utility method to check if a subquery (represented by its root 
RelNode) is provably scalar. Currently
-   * only aggregates with no group-by are considered scalar. In the 
future, this method should be generalized
-   * to include more cases and reconciled with Calcite's notion of scalar.
+   * only aggregates with no group-by and sub input rel with one row are 
considered scalar. In the future,
+   * this method should be generalized to include more cases and 
reconciled with Calcite's notion of scalar.
* @param root The root RelNode to be examined
* @return True if the root rel or its descendant is scalar, False 
otherwise
*/
   public static boolean isScalarSubquery(RelNode root) {
 DrillAggregateRel agg = null;
-RelNode currentrel = root;
-while (agg == null && currentrel != null) {
-  if (currentrel instanceof DrillAggregateRel) {
-agg = (DrillAggregateRel)currentrel;
-  } else if (currentrel instanceof RelSubset) {
-currentrel = ((RelSubset)currentrel).getBest() ;
-  } else if (currentrel.getInputs().size() == 1) {
+RelNode currentRel = root;
+boolean hasMoreInputs = false;
+while (agg == null && currentRel != null) {
+  if (currentRel instanceof DrillAggregateRel) {
+agg = (DrillAggregateRel)currentRel;
+  } else if (currentRel instanceof RelSubset) {
+currentRel = ((RelSubset)currentRel).getBest() ;
+  } else if (currentRel.getInputs().size() == 1) {
 // If the rel is not an aggregate or RelSubset, but is a 
single-input rel (could be Project,
 // Filter, Sort etc.), check its input
-currentrel = currentrel.getInput(0);
+currentRel = currentRel.getInput(0);
   } else {
+if(currentRel.getInputs().size()>1){
--- End diff --

Please add spaces: `if (currentRel.getInputs().size() > 1) {`


> multiple count distinct query planning error at physical phase 
> ---
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.9.0, 1.10.0
>Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5686) Warning for sasl.max_wrapped_size contain incorrect syntax

2017-08-08 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5686:

Fix Version/s: 1.12.0

> Warning for sasl.max_wrapped_size contain incorrect syntax
> --
>
> Key: DRILL-5686
> URL: https://issues.apache.org/jira/browse/DRILL-5686
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: Security, ready-to-commit
> Fix For: 1.12.0
>
>
> Reported by [~knguyen]
> In drill-override.conf, set security.user.encryption.sasl.max_wrapped_size: 
> 33554430, higher than the recommended max value of 16777215.
> The server logs a warning in drillbit.log as expected. However the warning 
> contains a syntax error:
> "2017-07-17 10:55:53,668 [main] WARN o.a.d.e.r.user.UserConnectionConfig - 
> The configured value of user.encryption.sasl.max_wrapped_size is too big. 
> This may cause higher memory pressure. [Details: Recommended max value is %s]"
> The "%s" should be the recommended value of 16777215



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5699) Drill Web UI Page Source Has Links To External Sites

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118057#comment-16118057
 ] 

ASF GitHub Bot commented on DRILL-5699:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/891
  
@parthchandra could you please do final review?


> Drill Web UI Page Source Has Links To External Sites
> 
>
> Key: DRILL-5699
> URL: https://issues.apache.org/jira/browse/DRILL-5699
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Reporter: Sindhuri Ramanarayan Rayavaram
>Assignee: Sindhuri Ramanarayan Rayavaram
>Priority: Minor
> Fix For: 1.12.0
>
>
> Drill uses external CDN for javascript and css files in the result page. When 
> there is no internet connection this page fails to load. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5699) Drill Web UI Page Source Has Links To External Sites

2017-08-08 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5699:

Reviewer: Parth Chandra  (was: Arina Ielchiieva)

> Drill Web UI Page Source Has Links To External Sites
> 
>
> Key: DRILL-5699
> URL: https://issues.apache.org/jira/browse/DRILL-5699
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Reporter: Sindhuri Ramanarayan Rayavaram
>Assignee: Sindhuri Ramanarayan Rayavaram
>Priority: Minor
> Fix For: 1.12.0
>
>
> Drill uses external CDN for javascript and css files in the result page. When 
> there is no internet connection this page fails to load. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5708) Add DNS decode function for PCAP storage

2017-08-08 Thread Takeo Ogawara (JIRA)
Takeo Ogawara created DRILL-5708:


 Summary: Add DNS decode function for PCAP storage
 Key: DRILL-5708
 URL: https://issues.apache.org/jira/browse/DRILL-5708
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Reporter: Takeo Ogawara
Priority: Minor


As described in DRILL-5432, it is very useful to analyze packet contents and 
application layer protocols. To improve the PCAP analysis function, it's better 
to add a function to decode DNS queries and responses. This enables to classify 
packets by FQDN and display user access trends.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)