[jira] [Commented] (DRILL-3958) Improve error message when JDBC driver not found

2017-12-19 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297982#comment-16297982
 ] 

Kunal Khatua commented on DRILL-3958:
-

[~amansinha100] can you review?

> Improve error message when JDBC driver not found
> 
>
> Key: DRILL-3958
> URL: https://issues.apache.org/jira/browse/DRILL-3958
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Affects Versions: 1.2.0
>Reporter: Uwe Geercken
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.13.0
>
>
> When setting up a storage definition for JDBC in the Drill web UI, the 
> appropriate driver has to be available in the 3rdparty folder before defining 
> the storage, otherwise an error is displayed.
> The error message refers to a JSON mapping error which is completely 
> inappropriate in this case, because the error is the missing JDBC driver in 
> the 3rdparty folder and not the JSON mapping.
> I request to change the error message to something appropriate that the 
> class/driver referred to could not be found (like for example: 
> com.mysql.jdbc.Driver)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (DRILL-3958) Improve error message when JDBC driver not found

2017-12-19 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua resolved DRILL-3958.
-
Resolution: Done
  Reviewer: Aman Sinha

> Improve error message when JDBC driver not found
> 
>
> Key: DRILL-3958
> URL: https://issues.apache.org/jira/browse/DRILL-3958
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Affects Versions: 1.2.0
>Reporter: Uwe Geercken
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.13.0
>
>
> When setting up a storage definition for JDBC in the Drill web UI, the 
> appropriate driver has to be available in the 3rdparty folder before defining 
> the storage, otherwise an error is displayed.
> The error message refers to a JSON mapping error which is completely 
> inappropriate in this case, because the error is the missing JDBC driver in 
> the 3rdparty folder and not the JSON mapping.
> I request to change the error message to something appropriate that the 
> class/driver referred to could not be found (like for example: 
> com.mysql.jdbc.Driver)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-3958) Improve error message when JDBC driver not found

2017-12-19 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua reassigned DRILL-3958:
---

 Assignee: Aman Sinha
Fix Version/s: 1.13.0

> Improve error message when JDBC driver not found
> 
>
> Key: DRILL-3958
> URL: https://issues.apache.org/jira/browse/DRILL-3958
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Affects Versions: 1.2.0
>Reporter: Uwe Geercken
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.13.0
>
>
> When setting up a storage definition for JDBC in the Drill web UI, the 
> appropriate driver has to be available in the 3rdparty folder before defining 
> the storage, otherwise an error is displayed.
> The error message refers to a JSON mapping error which is completely 
> inappropriate in this case, because the error is the missing JDBC driver in 
> the 3rdparty folder and not the JSON mapping.
> I request to change the error message to something appropriate that the 
> class/driver referred to could not be found (like for example: 
> com.mysql.jdbc.Driver)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (DRILL-3958) Improve error message when JDBC driver not found

2017-12-19 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua reopened DRILL-3958:
-

> Improve error message when JDBC driver not found
> 
>
> Key: DRILL-3958
> URL: https://issues.apache.org/jira/browse/DRILL-3958
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Affects Versions: 1.2.0
>Reporter: Uwe Geercken
>Priority: Critical
>
> When setting up a storage definition for JDBC in the Drill web UI, the 
> appropriate driver has to be available in the 3rdparty folder before defining 
> the storage, otherwise an error is displayed.
> The error message refers to a JSON mapping error which is completely 
> inappropriate in this case, because the error is the missing JDBC driver in 
> the 3rdparty folder and not the JSON mapping.
> I request to change the error message to something appropriate that the 
> class/driver referred to could not be found (like for example: 
> com.mysql.jdbc.Driver)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (DRILL-3958) Improve error message when JDBC driver not found

2017-12-19 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua resolved DRILL-3958.
-
Resolution: Done

> Improve error message when JDBC driver not found
> 
>
> Key: DRILL-3958
> URL: https://issues.apache.org/jira/browse/DRILL-3958
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Affects Versions: 1.2.0
>Reporter: Uwe Geercken
>Priority: Critical
>
> When setting up a storage definition for JDBC in the Drill web UI, the 
> appropriate driver has to be available in the 3rdparty folder before defining 
> the storage, otherwise an error is displayed.
> The error message refers to a JSON mapping error which is completely 
> inappropriate in this case, because the error is the missing JDBC driver in 
> the 3rdparty folder and not the JSON mapping.
> I request to change the error message to something appropriate that the 
> class/driver referred to could not be found (like for example: 
> com.mysql.jdbc.Driver)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (DRILL-6035) Specify Drill's JSON behavior

2017-12-19 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296230#comment-16296230
 ] 

Paul Rogers edited comment on DRILL-6035 at 12/20/17 6:41 AM:
--

h4. Lists

_*NOTE:* This section describes the Drill {{LIST}} type which turns out to be 
broken and not supported. The following is based on a prototype using the 
{{LIST}} type created after fixing some, but not all, {{LIST}} bugs._

JSON supports arrays of the form:

{code}
{a: ["I'm", "an", "array"] }
{code}

Drill has two very different ways to represent arrays:

1. As a {{REPEATED}} cardinality for most data types. This gives rise to a 
{{RepeatedFooVector}} for some type {{Foo}}.
2. As a {{LIST}} type with the {{ListVector}} implementation.

Here, Arrow has done a nice job. Arrow unified the {{REPEATED}} cardinality and 
the {{LIST}} vector type into a single concept. Drill, however, still has two 
systems.

h4. Repeated Cardinality

Drill's "go to" way to handle arrays is with the {{REPEATED}} cardinality (AKA 
"repeated data mode.") Most readers that handle arrays use the {{REPEATED}} 
form. To help understand the {{LIST}} type, we review {{REPEATED}} support here.

When working with a {{REPEATED} column, the rules for nulls  are:

* Arrays may not contain nulls. (Drill does not support nulls as array 
elements.)
* A null (or missing) array field is treated the same as an empty array.

If JSON were to use the {{REPEATED}} vectors, the following would be invalid:

{code}
[10, null, 20]
{code}

The following are all valid with {{REPEATED}} vectors:

{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{id: 4, a: [10, 20, 30]}
{code}

h4. Properties of Lists

The key properties of a List relative to a repeated type are:

* Each list can be a list of nothing, a list of a single type, or a list of a 
union of multiple types.
* Each list value can be null.
* Each list entry can be null (for primitive types.)

h4. Null Support

As explained below, lists support three kinds of nullability:

* The type itself can be null (a list of nulls)
* The list column value for a row can be null. (This is in contrast to repeated 
types in which an array can be empty, but the entire array value cannot be 
null.)
* When the list is of a primitive type, entries can be null. (The list is 
defined as list of nullable items, such as nullable BIGINT.)
* When the list is of maps, the map entries *cannot* be null. (Instead, the map 
columns are nullable and all columns for the "null" map are set to null.)
* When the list is of other lists, the list entry *cannot* be null. (Instead, 
the nested list is empty.)

The semantics are a bit confusing when seen from the outside. They make 
slightly more sense based on the implementation choices made in the code. 
(Though, generally we want the code to match our requirements, not the other 
way around.)

h4. Lists are Obscure

The {{LIST}} type appears to be used only for JSON, and it is unclear how well 
supported it is in the rest of Drill. For example, it is not clear that 
functions that work with arrays correctly handle null entries. (This needs to 
be tested.) 

JDBC supports array columns, but it is not clear if the Drill JDBC driver has 
implemented them. ODBC doesn't support arrays at all, so whether it supports 
arrays with nulls is a moot point.

h4. Lists in JSON

The {{LIST}} type appears to be used only for JSON where it is a better fit for 
JSON semantics than Drill's normal {{REPEATED}} cardinality. The list type 
allows list members to be null. All of the following are legal using lists:

{code}
{a: null}
{a: []}
{a: [null, null]}
{a: [null, 10, null]}
{a: [10, "foo"]}
{code}

We'll look at each of these in detail.

h4. Degenerate Lists

Consider the simplest possible list in JSON: a file that contains only an empty 
list:

{noformat}
{a: []}
{noformat}

What is the type of the list? In JSON, lists have no type, they are just lists. 
Drill requires a type, however when working with a {{REPEATED}} cardinality: 
the column must be an array of something.

Lists, however, can be a list of only nulls using the obscure {{LATE}} data 
type. That is, the list exists, but has no type. ({{LATE}} seems to suggest 
that the type will be assigned later.)

Next, consider another degenerate array:

{noformat}
{a: [null, null]}
{noformat}

Here we have an array of nulls. Again, we don't know what type these are a null 
of. Again, a LIST allows the JSON reader to produce a row with a single column 
{{`a`}} that is of type {{LIST}} that contains only the "dummy" {{LATE}} type. 
The list will indicate that we have two entries, both of which are null.

It is unclear, however, if the rest of Drill supports this concept. (DRILL-5970 
discusses a case in which an empty array, with a List of {{LATE}}, is exported 
to Parquet, producing results different than one might naively expect.)

h4. Single-type Lists

The 

[jira] [Comment Edited] (DRILL-6035) Specify Drill's JSON behavior

2017-12-19 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293316#comment-16293316
 ] 

Paul Rogers edited comment on DRILL-6035 at 12/20/17 6:39 AM:
--

h4. JSON Arrays

Drill supports simple arrays in JSON using the following rules:

* Arrays must contain hetrogeneous elements: any of the scalars described 
above, or a JSON object.
* Single-dimensional arrays cannot contain null entries.
* Two-dimensional arrays can contain nulls at the outer level but not the inner 
level.

(See a later comment for nested arrays.)

For example, the following are scalar arrays:
{code}
[10, 20]
[10.30, 10.45]
["foo", "bar"]
[true, false]
{code}

h4. Schema Change in Arrays

The following will trigger errors:

{code}
{a: [10, "foo"]}  // Mixed types
{a: [10]} {a: ["foo"]} // Schema change
{a: [10, 12.5]} // Conflicting types: integer and float
{code}

h4. Nulls in Arrays

h4. Missing {{LIST}} Support

JSON arrays can contain nulls. Drill provides a (partially completed, 
inoperable) {{LIST}} type as described below that handles nulls. But, this 
vector is not used in Drill 1.12 or earlier. Instead, Drill uses repeated types 
which cannot handle nulls. (The {{LIST}} type is described in a separate note 
below.)

Using array types, the following rules apply to nulls:

* An array cannot contain nulls.
* An empty array at the start of the file has an unknown type. (Do we select 
Nullable {{INT}}?)
* An entire array can be null, which is represented as an empty array. (That 
is, an empty array and a {{null}} value are considered the same.)

h4. Late Type Identification

As described earlier, Drill 1.13 will defer picking an array type if it sees 
null values. For example:

{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{id: 4, a: [10, 20, 30]}
{code}

In the above example, for id=2, Drill sees column `a` but does not pick a type. 
For id=3, Drill identifies that `a` is an array, but does not know the type. 
Finally, for id=4, Drill identifies the array as {{BIGINT}}.

h4. Null-Only Arrays

A special case occurs if a JSON file contains only empty arrays or arrays of 
nulls (such as a file that contains only the first three records above.)

In Drill 1.12 and earlier, the result is a list of {{LATE}} elements (See the 
List section below.) It seems that {{SqlLine}} will correctly show the null 
values.

An interesting case occurs when Drill reads two files: one with an array with 
only nulls, another with real values. For example:

{noformat}
File A: {a: [null, null] }
File B: {a: [10, 20] }
{noformat}

(The above condition can occur only if JSON uses the broken {{LIST}} type; it 
cannot occur in Drill 1.12. In 1.12, the equivalent condition is if File A 
contains:

{noformat}
{a: []}
{noformat}

Drill is distributed: one fragment will read File A, another will read File B. 
At some point, the two arrays will come together. One fragment will have 
created a list of {{LATE}}, another a list of {{BIGINT}}. Most operators will 
trigger a schema change error in this case.

Interestingly, however, if the query is a simple {{SELECT *}}, then the lists 
are compatible and {{SqlLine}} will display the correct results.

In Drill 1.13, if the first batch contains only nulls and/or empty arrays, 
Drill guesses that the type is an array of {{VARCHAR}}. Since this is only a 
guess, a schema change will result if the guess is wrong.


was (Author: paul.rogers):
h4. JSON Arrays

Drill supports simple arrays in JSON using the following rules:

* Arrays must contain hetrogeneous elements: any of the scalars described 
above, or a JSON object.

(See a later comment for nested arrays.)

For example, the following are scalar arrays:
{code}
[10, 20]
[10.30, 10.45]
["foo", "bar"]
[true, false]
{code}

h4. Schema Change in Arrays

The following will trigger errors:

{code}
{a: [10, "foo"]}  // Mixed types
{a: [10]} {a: ["foo"]} // Schema change
{a: [10, 12.5]} // Conflicting types: integer and float
{code}

h4. Nulls in Arrays

Drill handles nulls in arrays using the {{LIST}} type, described in a separate 
note below.

h4. Late Type Identification

As described earlier, Drill will defer picking an array type if it sees null 
values. For example:

{code}
{id: 1}
{id: 2, a: null}
{id: 3, a: []}
{id: 4, a: [10, 20, 30]}
{code}

In the above example, for id=2, Drill sees column `a` but does not pick a type. 
For id=3, Drill identifies that `a` is an array, but does not know the type. 
Finally, for id=4, Drill identifies the array as {{BIGINT}}.

h4. Null-Only Arrays

A special case occurs if a JSON file contains only empty arrays or arrays of 
nulls (such as a file that contains only the first three records above.)

In Drill 1.12 and earlier, the result is a list of {{LATE}} elements (See the 
List section below.) It seems that {{SqlLine}} will correctly show the null 
values.

An interesting case occurs when Drill reads two 

[jira] [Created] (DRILL-6048) ListVector is incomplete and broken, RepeatedListVector works

2017-12-19 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-6048:
--

 Summary: ListVector is incomplete and broken, RepeatedListVector 
works
 Key: DRILL-6048
 URL: https://issues.apache.org/jira/browse/DRILL-6048
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers


Drill provides two kinds of "list vectors": {{ListVector}} and 
{{RepeatedListVector}}. I attempted to use the {{ListVector}} to implement 
lists in JSON. While some parts work, others are broken and JIRA tickets were 
filed.

Once things worked well enough to run a query, it turned out that the Project 
operator failed. Digging into the cause, it appears that the {{ListVector}} is 
incomplete and not used. Its implementation of {{makeTransferPair()}} was 
clearly never tested. A list has contents, but when this method attempts to 
create the contents of the target vector, it fails to create the list contents.

Elsewhere, we saw that the constructor did correctly create the vector, and 
that the {{promoteToUnion()}} had holes. The sheer number of bugs leads to the 
conclusion that this class is not, in fact, used or usable.

Looking more carefully at the JSON and older writer code, it appears that the 
ListVector was *not* used for JSON, and that JSON has the limitations of a 
repeated vector (it cannot support lists with null elements.)

This implies that the JSON reader itself is broken as it does not support fully 
JSON semantics because it does not use the {{ListVector}} that was intended for 
this purpose.

So, the conclusion is that JSON uses:

* Repeated vectors for single-dimensional arrays (without null support)
* {{RepeatedListVector}} for two-dimensional arrays

This triggers the question: what do we do for three-dimensional arrays?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6046) Define semantics of vector metadata

2017-12-19 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297684#comment-16297684
 ] 

Paul Rogers commented on DRILL-6046:


Suggested improvements. First, to ensure that the metadata tree remains 
consistent:

* The materialized field passed to the constructor is the one used for the 
vector.
* The materialized field created for the vector is final, it can change but 
cannot be replaced.

To ensure consistent vector creation:

* Every vector constructor should build itself as defined by the passed-in 
materialized field.

To avoid clutter:

* Every vector includes its internal fields and public child fields in the list 
of children.
* Add a field to mark a materialized field as private. Private fields are not 
compared when checking if two fields are equal. Private fields are ignored when 
building a new vector.
* Provide a method on materialized field to get the public schema (without 
internal vectors).

> Define semantics of vector metadata
> ---
>
> Key: DRILL-6046
> URL: https://issues.apache.org/jira/browse/DRILL-6046
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Vectors provide metadata in the form of the {{MaterializedField}}. This class 
> has evolved in an ad-hoc fashion over time, resulting in inconsistent 
> behavior across vectors. The inconsistent behavior causes bugs and slow 
> development because each vector follows different rules. Consistent behavior 
> would, by contrast, lead to faster development and fewer bugs by reducing the 
> number of variations that code must handle.
> Issues include:
> * Map vectors, but not lists, can create contents given a list of children in 
> the {{MaterializedField}} passed to the constructor.
> * {{MaterializedField}} appears to want to be immutable, but it does allow 
> changing of children. Unions also want to change the list of subtypes, but 
> that is in the immutable {{MajorType}}, causing unions to rebuild and replace 
> its {{MaterializedField}} on addition of a new type. By contrast, maps do not 
> replace the field, they just add children.
> * Container vectors (maps, unions, lists) hold references to child 
> {{MaterializedFields}}. But, because unions replace their fields, parents 
> become out of sync since they point to the old, version before the update, 
> causing inconsistent metadata, so that code cannot trust the metadata.
> * Lists and maps, but not unions, list their children in the field.
> * Nullable types, but not repeated types, include internal vectors in their 
> list of children. 
> * When creating a map, as discussed above, the map creates children based on 
> the field. But, the constructor clones the field so that the actual field in 
> the map is not the one passed in. As a result, a parent vector, which holds a 
> child map, points to the original map field, not the cloned one, leading to 
> inconsistency if the child map later adds more fields.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297542#comment-16297542
 ] 

ASF GitHub Bot commented on DRILL-5993:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r157894540
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/Copier.java
 ---
@@ -19,13 +19,15 @@
 
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.VectorContainer;
 
 public interface Copier {
-  public static TemplateClassDefinition TEMPLATE_DEFINITION2 = new 
TemplateClassDefinition(Copier.class, CopierTemplate2.class);
-  public static TemplateClassDefinition TEMPLATE_DEFINITION4 = new 
TemplateClassDefinition(Copier.class, CopierTemplate4.class);
+  TemplateClassDefinition TEMPLATE_DEFINITION2 = new 
TemplateClassDefinition(Copier.class, CopierTemplate2.class);
+  TemplateClassDefinition TEMPLATE_DEFINITION4 = new 
TemplateClassDefinition(Copier.class, CopierTemplate4.class);
--- End diff --

I will create a separate PR to do that since that change is unrelated to 
this PR


> Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
> 
>
> Key: DRILL-5993
> URL: https://issues.apache.org/jira/browse/DRILL-5993
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> Currently the copier can only copy record from an incoming batch to the 
> beginning of an outgoing batch. We need to be able to copy a record and 
> append it to the end of the outgoing batch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297541#comment-16297541
 ] 

ASF GitHub Bot commented on DRILL-5993:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r157894423
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/CopierTemplate2.java
 ---
@@ -53,17 +51,32 @@ public int copyRecords(int index, int recordCount) 
throws SchemaChangeException
   }
 }
 
-int outgoingPosition = 0;
+return insertRecords(0, index, recordCount);
+  }
+
+  @Override
+  public int appendRecord(int index) throws SchemaChangeException {
+return appendRecords(index, 1);
+  }
+
+  @Override
+  public int appendRecords(int index, int recordCount) throws 
SchemaChangeException {
+return insertRecords(outgoing.getRecordCount(), index, recordCount);
+  }
+
+  private int insertRecords(int outgoingPosition, int index, int 
recordCount) throws SchemaChangeException {
+final int endIndex = index + recordCount;
 
-for(int svIndex = index; svIndex < index + recordCount; svIndex++, 
outgoingPosition++){
+for(int svIndex = index; svIndex < endIndex; svIndex++, 
outgoingPosition++){
   doEval(sv2.getIndex(svIndex), outgoingPosition);
 }
+
+outgoing.setRecordCount(outgoingPosition);
 return outgoingPosition;
   }
 
-  public abstract void doSetup(@Named("context") FragmentContext context,
-   @Named("incoming") RecordBatch incoming,
-   @Named("outgoing") RecordBatch outgoing)
+  public abstract void doSetup(@Named("incoming") RecordBatch incoming,
+   @Named("outgoing") VectorContainer outgoing)
--- End diff --

The copiers are only used in the SVRemover and TopN operator. I have 
replaced the code generated copiers in both now to use the GenericCopiers.


> Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
> 
>
> Key: DRILL-5993
> URL: https://issues.apache.org/jira/browse/DRILL-5993
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> Currently the copier can only copy record from an incoming batch to the 
> beginning of an outgoing batch. We need to be able to copy a record and 
> append it to the end of the outgoing batch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-6047) Update doc to include instructions for libpam4j

2017-12-19 Thread Bridget Bevens (JIRA)
Bridget Bevens created DRILL-6047:
-

 Summary: Update doc to include instructions for libpam4j
 Key: DRILL-6047
 URL: https://issues.apache.org/jira/browse/DRILL-6047
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 1.12.0
Reporter: Bridget Bevens
Assignee: Bridget Bevens
Priority: Minor
 Fix For: 1.12.0


Update Apache Drill docs to include JPAM and libpam4j PAM authenticator 
instructions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-6046) Define semantics of vector metadata

2017-12-19 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-6046:
---
Description: 
Vectors provide metadata in the form of the {{MaterializedField}}. This class 
has evolved in an ad-hoc fashion over time, resulting in inconsistent behavior 
across vectors. The inconsistent behavior causes bugs and slow development 
because each vector follows different rules. Consistent behavior would, by 
contrast, lead to faster development and fewer bugs by reducing the number of 
variations that code must handle.

Issues include:

* Map vectors, but not lists, can create contents given a list of children in 
the {{MaterializedField}} passed to the constructor.
* {{MaterializedField}} appears to want to be immutable, but it does allow 
changing of children. Unions also want to change the list of subtypes, but that 
is in the immutable {{MajorType}}, causing unions to rebuild and replace its 
{{MaterializedField}} on addition of a new type. By contrast, maps do not 
replace the field, they just add children.
* Container vectors (maps, unions, lists) hold references to child 
{{MaterializedFields}}. But, because unions replace their fields, parents 
become out of sync since they point to the old, version before the update, 
causing inconsistent metadata, so that code cannot trust the metadata.
* Lists and maps, but not unions, list their children in the field.
* Nullable types, but not repeated types, include internal vectors in their 
list of children. 
* When creating a map, as discussed above, the map creates children based on 
the field. But, the constructor clones the field so that the actual field in 
the map is not the one passed in. As a result, a parent vector, which holds a 
child map, points to the original map field, not the cloned one, leading to 
inconsistency if the child map later adds more fields.

  was:
Vectors provide metadata in the form of the {{MaterializedField}}. This class 
has evolved in an ad-hoc fashion over time, resulting in inconsistent behavior 
across vectors. The inconsistent behavior causes bugs and slow development 
because each vector follows different rules. Consistent behavior would, by 
contrast, lead to faster development and fewer bugs by reducing the number of 
variations that code must handle.

Issues include:

* Map vectors, but not lists, can create contents given a list of children in 
the {{MaterializedField}} passed to the constructor.
* {{MaterializedField}} appears to want to be immutable, but it does allow 
changing of children. Unions also want to change the list of subtypes, but that 
is in the immutable {{MajorType}}, causing unions to rebuild and replace its 
{{MaterializedField}} on addition of a new type. By contrast, maps do not 
replace the field, they just add children.
* Container vectors (maps, unions, lists) hold references to child 
{{MaterializedFields}}. But, because unions replace their fields, parents 
become out of sync since they point to the old, version before the update, 
causing inconsistent metadata, so that code cannot trust the metadata.
* Lists and maps, but not unions, list their children in the field.
* Nullable types, but not repeated types, include internal vectors in their 
list of children. 


> Define semantics of vector metadata
> ---
>
> Key: DRILL-6046
> URL: https://issues.apache.org/jira/browse/DRILL-6046
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Vectors provide metadata in the form of the {{MaterializedField}}. This class 
> has evolved in an ad-hoc fashion over time, resulting in inconsistent 
> behavior across vectors. The inconsistent behavior causes bugs and slow 
> development because each vector follows different rules. Consistent behavior 
> would, by contrast, lead to faster development and fewer bugs by reducing the 
> number of variations that code must handle.
> Issues include:
> * Map vectors, but not lists, can create contents given a list of children in 
> the {{MaterializedField}} passed to the constructor.
> * {{MaterializedField}} appears to want to be immutable, but it does allow 
> changing of children. Unions also want to change the list of subtypes, but 
> that is in the immutable {{MajorType}}, causing unions to rebuild and replace 
> its {{MaterializedField}} on addition of a new type. By contrast, maps do not 
> replace the field, they just add children.
> * Container vectors (maps, unions, lists) hold references to child 
> {{MaterializedFields}}. But, because unions replace their fields, parents 
> become out of sync since they point to the old, version before the update, 
> causing inconsistent metadata, so that code cannot trust the metadata.
> * Lists and maps, but not unions, list their 

[jira] [Created] (DRILL-6046) Define semantics of vector metadata

2017-12-19 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-6046:
--

 Summary: Define semantics of vector metadata
 Key: DRILL-6046
 URL: https://issues.apache.org/jira/browse/DRILL-6046
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.10.0
Reporter: Paul Rogers
Priority: Minor


Vectors provide metadata in the form of the {{MaterializedField}}. This class 
has evolved in an ad-hoc fashion over time, resulting in inconsistent behavior 
across vectors. The inconsistent behavior causes bugs and slow development 
because each vector follows different rules. Consistent behavior would, by 
contrast, lead to faster development and fewer bugs by reducing the number of 
variations that code must handle.

Issues include:

* Map vectors, but not lists, can create contents given a list of children in 
the {{MaterializedField}} passed to the constructor.
* {{MaterializedField}} appears to want to be immutable, but it does allow 
changing of children. Unions also want to change the list of subtypes, but that 
is in the immutable {{MajorType}}, causing unions to rebuild and replace its 
{{MaterializedField}} on addition of a new type. By contrast, maps do not 
replace the field, they just add children.
* Container vectors (maps, unions, lists) hold references to child 
{{MaterializedFields}}. But, because unions replace their fields, parents 
become out of sync since they point to the old, version before the update, 
causing inconsistent metadata, so that code cannot trust the metadata.
* Lists and maps, but not unions, list their children in the field.
* Nullable types, but not repeated types, include internal vectors in their 
list of children. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-6045) Doc new parameter

2017-12-19 Thread Bridget Bevens (JIRA)
Bridget Bevens created DRILL-6045:
-

 Summary: Doc new parameter
 Key: DRILL-6045
 URL: https://issues.apache.org/jira/browse/DRILL-6045
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 1.12.0
Reporter: Bridget Bevens
Assignee: Bridget Bevens


Document the new parameter listed in DRILL-5815



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6030) Managed sort should minimize number of batches in a k-way merge

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297509#comment-16297509
 ] 

ASF GitHub Bot commented on DRILL-6030:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1075
  
The scenario when all batches can be merged in memory is covered by 'if 
(canUseMemoryMerge())` check in `SortImpl.java:399`. The affected code path 
applies only to cases where merge between spilled and in-memory batches is 
necessary. Note that this is a short term fix to improve managed sort 
performance, in a long run, it is necessary to have an ability to merge all 
batches in memory (using SV4) without spilling and be able to merge it with the 
spilled data.


> Managed sort should minimize number of batches in a k-way merge
> ---
>
> Key: DRILL-6030
> URL: https://issues.apache.org/jira/browse/DRILL-6030
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>
> The time complexity of the algorithm is O(n*k*log(k)) where k is a number of 
> batches to merge and n is a number of records in each batch (assuming equal 
> size batches). As n*k is the total number of record to merge and it can be 
> quite large, minimizing k should give better results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6030) Managed sort should minimize number of batches in a k-way merge

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297490#comment-16297490
 ] 

ASF GitHub Bot commented on DRILL-6030:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1075#discussion_r157885846
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortConfig.java
 ---
@@ -84,7 +85,7 @@ public SortConfig(DrillConfig config) {
 if (limit > 0) {
   mergeLimit = Math.max(limit, MIN_MERGE_LIMIT);
 } else {
-  mergeLimit = Integer.MAX_VALUE;
+  mergeLimit = DEFAULT_MERGE_LIMIT;
--- End diff --

IMO, it is better to change the default to avoid upgrade problems. In an 
upgrade scenario,  users may simply overwrite `drill-override.conf` from their 
prior installations and forget to set the merge limit. Is there a reason not to 
change the default merge limit?


> Managed sort should minimize number of batches in a k-way merge
> ---
>
> Key: DRILL-6030
> URL: https://issues.apache.org/jira/browse/DRILL-6030
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>
> The time complexity of the algorithm is O(n*k*log(k)) where k is a number of 
> batches to merge and n is a number of records in each batch (assuming equal 
> size batches). As n*k is the total number of record to merge and it can be 
> quite large, minimizing k should give better results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5993) Allow Copier to Copy a Record and Append to the End of an Outgoing Batch

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297299#comment-16297299
 ] 

ASF GitHub Bot commented on DRILL-5993:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1057#discussion_r157853697
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/svremover/CopierTemplate4.java
 ---
@@ -54,17 +52,33 @@ public int copyRecords(int index, int recordCount) 
throws SchemaChangeException
   }
 }
 
-int outgoingPosition = 0;
-for(int svIndex = index; svIndex < index + recordCount; svIndex++, 
outgoingPosition++){
+return insertRecords(0, index, recordCount);
+  }
+
+  @Override
+  public int appendRecord(int index) throws SchemaChangeException {
+return appendRecords(index, 1);
+  }
--- End diff --

Updated the code and made an implementation of appendRecord which doesn't 
use a for loop


> Allow Copier to Copy a Record and Append to the End of an Outgoing Batch
> 
>
> Key: DRILL-5993
> URL: https://issues.apache.org/jira/browse/DRILL-5993
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> Currently the copier can only copy record from an incoming batch to the 
> beginning of an outgoing batch. We need to be able to copy a record and 
> append it to the end of the outgoing batch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5967) Memory leak by HashPartitionSender

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297291#comment-16297291
 ] 

ASF GitHub Bot commented on DRILL-5967:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1073
  
Removed unnecessary creation of list in OrderedPartitionSenderCreator as 
discussed. @paul-rogers  please take a look.


> Memory leak by HashPartitionSender
> --
>
> Key: DRILL-5967
> URL: https://issues.apache.org/jira/browse/DRILL-5967
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The error found by [~cch...@maprtech.com] and [~dechanggu]
> {code}
> 2017-10-25 15:43:28,658 [260eec84-7de3-03ec-300f-7fdbc111fb7c:frag:2:9] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Memory was leaked by query. Memory leaked: (9216)
> Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 
> (res/actual/peak/limit)
> Fragment 2:9
> [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (9216)
> Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 
> (res/actual/peak/limit)
> Fragment 2:9
> [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
>  ~[drill-common-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:301)
>  [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
>  [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> Caused by: java.lang.IllegalStateException: Memory was leaked by query. 
> Memory leaked: (9216)
> Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 
> (res/actual/peak/limit)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-6044) Shutdown button does not work from WebUI

2017-12-19 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal updated DRILL-6044:
---
Attachment: Screen Shot 2017-12-19 at 10.51.16 AM.png

> Shutdown button does not work from WebUI
> 
>
> Key: DRILL-6044
> URL: https://issues.apache.org/jira/browse/DRILL-6044
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.13.0
>Reporter: Krystal
> Attachments: Screen Shot 2017-12-19 at 10.51.16 AM.png
>
>
> git.commit.id.abbrev=eb0c403
> Nothing happens when click on the SHUTDOWN button from the WebUI.  The 
> browser's debugger showed that the request failed due to access control 
> checks (see attached screen shot).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-6044) Shutdown button does not work from WebUI

2017-12-19 Thread Krystal (JIRA)
Krystal created DRILL-6044:
--

 Summary: Shutdown button does not work from WebUI
 Key: DRILL-6044
 URL: https://issues.apache.org/jira/browse/DRILL-6044
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - HTTP
Affects Versions: 1.13.0
Reporter: Krystal


git.commit.id.abbrev=eb0c403

Nothing happens when click on the SHUTDOWN button from the WebUI.  The 
browser's debugger showed that the request failed due to access control checks 
(see attached screen shot).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6030) Managed sort should minimize number of batches in a k-way merge

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297218#comment-16297218
 ] 

ASF GitHub Bot commented on DRILL-6030:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1075
  
One additional thought. This bug was found when sorting 18 GB of data in 8 
GB of memory. That is, a case in which the sort must spill.

What happens in the case in which the 18 GB of data is sorted in, say, 20 
GB of memory (an in-memory sort)? We don't want the merge limit to force a 
spill in this case; kind of defeats the purpose of an in-memory sort.

So:

1. Does the limit affect in memory sort? If so, we need to revise the 
solution.
2. Does the in-memory sort suffer from a similar performance issue? If so, 
we need to revise the in memory sort.

One possible solution is to:

1. Defer sorting of individual batches until necessary.
2. Sort batches just before spilling.
3. If all batches fit in memory, do a single, combined sort (using an SV4).


> Managed sort should minimize number of batches in a k-way merge
> ---
>
> Key: DRILL-6030
> URL: https://issues.apache.org/jira/browse/DRILL-6030
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>
> The time complexity of the algorithm is O(n*k*log(k)) where k is a number of 
> batches to merge and n is a number of records in each batch (assuming equal 
> size batches). As n*k is the total number of record to merge and it can be 
> quite large, minimizing k should give better results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6030) Managed sort should minimize number of batches in a k-way merge

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297169#comment-16297169
 ] 

ASF GitHub Bot commented on DRILL-6030:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1075#discussion_r157830252
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortConfig.java
 ---
@@ -84,7 +85,7 @@ public SortConfig(DrillConfig config) {
 if (limit > 0) {
   mergeLimit = Math.max(limit, MIN_MERGE_LIMIT);
 } else {
-  mergeLimit = Integer.MAX_VALUE;
+  mergeLimit = DEFAULT_MERGE_LIMIT;
--- End diff --

The merge limit is already a config option. (I'd forgotten about that.) The 
comment on the config option says "Limit on the number of spilled batches that 
can be merged in a single pass." So, let's just set that default (in 
`drill-override-conf`) to your new value of 128 and leave the code here 
unchanged.


> Managed sort should minimize number of batches in a k-way merge
> ---
>
> Key: DRILL-6030
> URL: https://issues.apache.org/jira/browse/DRILL-6030
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>
> The time complexity of the algorithm is O(n*k*log(k)) where k is a number of 
> batches to merge and n is a number of records in each batch (assuming equal 
> size batches). As n*k is the total number of record to merge and it can be 
> quite large, minimizing k should give better results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-6020) NullPointerException with Union setting on when querying JSON untyped path

2017-12-19 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6020:


Assignee: Mitchel Labonte

> NullPointerException with Union setting on when querying JSON untyped path
> --
>
> Key: DRILL-6020
> URL: https://issues.apache.org/jira/browse/DRILL-6020
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Mitchel Labonte
>Assignee: Mitchel Labonte
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> h1. Steps to reproduce
> alter session set `exec.enable_union_type`=true;
> select tb.level1.dta from dfs.`file.json` tb;
> *Content of file.json:*
> {noformat}
> {"level1":{"dta":{"test":"test"}}}
> {"level1":{"dta":"test"}}
> {noformat}
> h1. Stack trace
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: fe267584-32f3-413c-a77c-fc5b5c1ba513 on localhost:31010]
>   (java.lang.NullPointerException) null
> 
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatchesUnion():34
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():135
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():130
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldId():201
> org.apache.drill.exec.record.SimpleVectorWrapper.getFieldIdIfMatches():102
> org.apache.drill.exec.record.VectorContainer.getValueVectorId():298
> org.apache.drill.exec.physical.impl.ScanBatch.getValueVectorId():313
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():289
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():272
> org.apache.drill.common.expression.SchemaPath.accept():150
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():399
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():331
> org.apache.drill.common.expression.FunctionCall.accept():60
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():169
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():147
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():421
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5967) Memory leak by HashPartitionSender

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297050#comment-16297050
 ] 

ASF GitHub Bot commented on DRILL-5967:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1073
  
LGTM


> Memory leak by HashPartitionSender
> --
>
> Key: DRILL-5967
> URL: https://issues.apache.org/jira/browse/DRILL-5967
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The error found by [~cch...@maprtech.com] and [~dechanggu]
> {code}
> 2017-10-25 15:43:28,658 [260eec84-7de3-03ec-300f-7fdbc111fb7c:frag:2:9] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Memory was leaked by query. Memory leaked: (9216)
> Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 
> (res/actual/peak/limit)
> Fragment 2:9
> [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (9216)
> Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 
> (res/actual/peak/limit)
> Fragment 2:9
> [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
>  ~[drill-common-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:301)
>  [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
>  [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> Caused by: java.lang.IllegalStateException: Memory was leaked by query. 
> Memory leaked: (9216)
> Allocator(op:2:9:0:HashPartitionSender) 100/9216/12831744/100 
> (res/actual/peak/limit)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5919) Add non-numeric support for JSON processing

2017-12-19 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-5919:
-
Fix Version/s: (was: Future)
   1.13.0

> Add non-numeric support for JSON processing
> ---
>
> Key: DRILL-5919
> URL: https://issues.apache.org/jira/browse/DRILL-5919
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Add session options to allow drill working with non standard json strings 
> number literals like: NaN, Infinity, -Infinity. By default these options will 
> be switched off, the user will be able to toggle them during working session.
> *For documentation*
> 1. Added two session options {{store.json.reader.non_numeric_numbers}} and 
> {{store.json.reader.non_numeric_numbers}} that allow to read/write NaN and 
> Infinity as numbers. By default these options are set to false.
> 2. Extended signature of {{convert_toJSON}} and {{convert_fromJSON}} 
> functions by adding second optional parameter that enables read/write NaN and 
> Infinity.
> For example:
> {noformat}
> select convert_fromJSON('{"key": NaN}') from (values(1)); will result with 
> JsonParseException, but
> select convert_fromJSON('{"key": NaN}', true) from (values(1)); will parse 
> NaN as a number.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-6020) NullPointerException with Union setting on when querying JSON untyped path

2017-12-19 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6020:

Labels: ready-to-commit  (was: )

> NullPointerException with Union setting on when querying JSON untyped path
> --
>
> Key: DRILL-6020
> URL: https://issues.apache.org/jira/browse/DRILL-6020
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Mitchel Labonte
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> h1. Steps to reproduce
> alter session set `exec.enable_union_type`=true;
> select tb.level1.dta from dfs.`file.json` tb;
> *Content of file.json:*
> {noformat}
> {"level1":{"dta":{"test":"test"}}}
> {"level1":{"dta":"test"}}
> {noformat}
> h1. Stack trace
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: fe267584-32f3-413c-a77c-fc5b5c1ba513 on localhost:31010]
>   (java.lang.NullPointerException) null
> 
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatchesUnion():34
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():135
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():130
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldId():201
> org.apache.drill.exec.record.SimpleVectorWrapper.getFieldIdIfMatches():102
> org.apache.drill.exec.record.VectorContainer.getValueVectorId():298
> org.apache.drill.exec.physical.impl.ScanBatch.getValueVectorId():313
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():289
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():272
> org.apache.drill.common.expression.SchemaPath.accept():150
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():399
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():331
> org.apache.drill.common.expression.FunctionCall.accept():60
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():169
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():147
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():421
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6020) NullPointerException with Union setting on when querying JSON untyped path

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296859#comment-16296859
 ] 

ASF GitHub Bot commented on DRILL-6020:
---

Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/1068
  
@MitchelLabonte, thanks for the pull request, +1


> NullPointerException with Union setting on when querying JSON untyped path
> --
>
> Key: DRILL-6020
> URL: https://issues.apache.org/jira/browse/DRILL-6020
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Mitchel Labonte
> Fix For: 1.13.0
>
>
> h1. Steps to reproduce
> alter session set `exec.enable_union_type`=true;
> select tb.level1.dta from dfs.`file.json` tb;
> *Content of file.json:*
> {noformat}
> {"level1":{"dta":{"test":"test"}}}
> {"level1":{"dta":"test"}}
> {noformat}
> h1. Stack trace
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: fe267584-32f3-413c-a77c-fc5b5c1ba513 on localhost:31010]
>   (java.lang.NullPointerException) null
> 
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatchesUnion():34
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():135
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():130
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldId():201
> org.apache.drill.exec.record.SimpleVectorWrapper.getFieldIdIfMatches():102
> org.apache.drill.exec.record.VectorContainer.getValueVectorId():298
> org.apache.drill.exec.physical.impl.ScanBatch.getValueVectorId():313
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():289
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():272
> org.apache.drill.common.expression.SchemaPath.accept():150
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():399
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():331
> org.apache.drill.common.expression.FunctionCall.accept():60
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():169
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():147
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():421
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6020) NullPointerException with Union setting on when querying JSON untyped path

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296817#comment-16296817
 ] 

ASF GitHub Bot commented on DRILL-6020:
---

Github user MitchelLabonte commented on the issue:

https://github.com/apache/drill/pull/1068
  
@vvysotskyi No problem, yes all tests pass now. 


> NullPointerException with Union setting on when querying JSON untyped path
> --
>
> Key: DRILL-6020
> URL: https://issues.apache.org/jira/browse/DRILL-6020
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Mitchel Labonte
> Fix For: 1.13.0
>
>
> h1. Steps to reproduce
> alter session set `exec.enable_union_type`=true;
> select tb.level1.dta from dfs.`file.json` tb;
> *Content of file.json:*
> {noformat}
> {"level1":{"dta":{"test":"test"}}}
> {"level1":{"dta":"test"}}
> {noformat}
> h1. Stack trace
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: fe267584-32f3-413c-a77c-fc5b5c1ba513 on localhost:31010]
>   (java.lang.NullPointerException) null
> 
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatchesUnion():34
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():135
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():130
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldId():201
> org.apache.drill.exec.record.SimpleVectorWrapper.getFieldIdIfMatches():102
> org.apache.drill.exec.record.VectorContainer.getValueVectorId():298
> org.apache.drill.exec.physical.impl.ScanBatch.getValueVectorId():313
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():289
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():272
> org.apache.drill.common.expression.SchemaPath.accept():150
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():399
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():331
> org.apache.drill.common.expression.FunctionCall.accept():60
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():169
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():147
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():421
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6020) NullPointerException with Union setting on when querying JSON untyped path

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296808#comment-16296808
 ] 

ASF GitHub Bot commented on DRILL-6020:
---

Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/1068
  
Sorry, I read your removed comment and made a suggestion that this test 
fails. 
Did you run all unit tests to see that this change does not break anything?


> NullPointerException with Union setting on when querying JSON untyped path
> --
>
> Key: DRILL-6020
> URL: https://issues.apache.org/jira/browse/DRILL-6020
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Mitchel Labonte
> Fix For: 1.13.0
>
>
> h1. Steps to reproduce
> alter session set `exec.enable_union_type`=true;
> select tb.level1.dta from dfs.`file.json` tb;
> *Content of file.json:*
> {noformat}
> {"level1":{"dta":{"test":"test"}}}
> {"level1":{"dta":"test"}}
> {noformat}
> h1. Stack trace
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: fe267584-32f3-413c-a77c-fc5b5c1ba513 on localhost:31010]
>   (java.lang.NullPointerException) null
> 
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatchesUnion():34
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():135
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():130
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldId():201
> org.apache.drill.exec.record.SimpleVectorWrapper.getFieldIdIfMatches():102
> org.apache.drill.exec.record.VectorContainer.getValueVectorId():298
> org.apache.drill.exec.physical.impl.ScanBatch.getValueVectorId():313
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():289
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():272
> org.apache.drill.common.expression.SchemaPath.accept():150
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():399
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():331
> org.apache.drill.common.expression.FunctionCall.accept():60
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():169
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():147
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():421
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6020) NullPointerException with Union setting on when querying JSON untyped path

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296776#comment-16296776
 ] 

ASF GitHub Bot commented on DRILL-6020:
---

Github user MitchelLabonte commented on the issue:

https://github.com/apache/drill/pull/1068
  
@vvysotskyi
This is happening because the type is cached as a json object from the 
previous row. The fix is similar to the getFieldIdIfMatches() method so it 
looks like this is intended behaviour.
As you can see in the unit test, the results are what is expected after the 
fix. I am not sure what else could be done. 


> NullPointerException with Union setting on when querying JSON untyped path
> --
>
> Key: DRILL-6020
> URL: https://issues.apache.org/jira/browse/DRILL-6020
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Mitchel Labonte
> Fix For: 1.13.0
>
>
> h1. Steps to reproduce
> alter session set `exec.enable_union_type`=true;
> select tb.level1.dta from dfs.`file.json` tb;
> *Content of file.json:*
> {noformat}
> {"level1":{"dta":{"test":"test"}}}
> {"level1":{"dta":"test"}}
> {noformat}
> h1. Stack trace
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: fe267584-32f3-413c-a77c-fc5b5c1ba513 on localhost:31010]
>   (java.lang.NullPointerException) null
> 
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatchesUnion():34
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():135
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():130
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldId():201
> org.apache.drill.exec.record.SimpleVectorWrapper.getFieldIdIfMatches():102
> org.apache.drill.exec.record.VectorContainer.getValueVectorId():298
> org.apache.drill.exec.physical.impl.ScanBatch.getValueVectorId():313
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():289
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():272
> org.apache.drill.common.expression.SchemaPath.accept():150
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():399
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():331
> org.apache.drill.common.expression.FunctionCall.accept():60
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():169
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():147
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():421
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6020) NullPointerException with Union setting on when querying JSON untyped path

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296610#comment-16296610
 ] 

ASF GitHub Bot commented on DRILL-6020:
---

Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/1068
  
@MitchelLabonte, I think this NPE is just a consequence of the bug that 
should be fixed. Please investigate why Drill is trying to use child 
`PathSegment` when a value has VarChar type.


> NullPointerException with Union setting on when querying JSON untyped path
> --
>
> Key: DRILL-6020
> URL: https://issues.apache.org/jira/browse/DRILL-6020
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Mitchel Labonte
> Fix For: 1.13.0
>
>
> h1. Steps to reproduce
> alter session set `exec.enable_union_type`=true;
> select tb.level1.dta from dfs.`file.json` tb;
> *Content of file.json:*
> {noformat}
> {"level1":{"dta":{"test":"test"}}}
> {"level1":{"dta":"test"}}
> {noformat}
> h1. Stack trace
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: fe267584-32f3-413c-a77c-fc5b5c1ba513 on localhost:31010]
>   (java.lang.NullPointerException) null
> 
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatchesUnion():34
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():135
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldIdIfMatches():130
> org.apache.drill.exec.vector.complex.FieldIdUtil.getFieldId():201
> org.apache.drill.exec.record.SimpleVectorWrapper.getFieldIdIfMatches():102
> org.apache.drill.exec.record.VectorContainer.getValueVectorId():298
> org.apache.drill.exec.physical.impl.ScanBatch.getValueVectorId():313
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():289
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath():272
> org.apache.drill.common.expression.SchemaPath.accept():150
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():399
> 
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$AbstractMaterializeVisitor.visitFunctionCall():331
> org.apache.drill.common.expression.FunctionCall.accept():60
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():169
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize():147
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():421
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6043) Nullable vector, but not List vector, adds its internal vectors to child list

2017-12-19 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296436#comment-16296436
 ] 

Paul Rogers commented on DRILL-6043:


Once the list passes through project, it does pick up the internal vectors:

{noformat}
`flat`(LIST:OPTIONAL) [`[DEFAULT]`(LATE:OPTIONAL),
`$data$`(LIST:OPTIONAL) ...]]]
{noformat}

But, the way this information was added is broken. Project added a vector for 
the data portion of the list. But, it did remove the original "dummy" type 
created when the list is created. This leaves the list with two children when 
it should have one.

We must have multiple ways that we manipulate list internals, leading to these 
errors.

> Nullable vector, but not List vector, adds its internal vectors to child list
> -
>
> Key: DRILL-6043
> URL: https://issues.apache.org/jira/browse/DRILL-6043
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Each Drill vector has associated metadata in the form of a 
> {{MaterializeField}} instance. The {{MaterializeField}} contains a list of 
> children. For a Map vector, the list of children lists the vectors that make 
> up the map.
> Nullable vectors use the list of children to identify the hidden vectors that 
> make up the nullable vectors: {{$bits$}} and {{$values$}}.
> However, repeated vectors (including lists) also have hidden internal 
> vectors: offsets and values. However, the metadata for repeated types and 
> lists do not include these in the vector metadata.
> We should decide if we need metadata for the implied internal vectors. 
> (Having it does cause problems since a newly-created schema for a nullable 
> vector is not equal to the actual schema created by the vector itself.)
> If we don't need the internal vector metadata, remove it from the nullable 
> vectors.
> But, if we do need it, add it to the repeated vectors and to lists.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-6043) Nullable vector, but not List vector, adds its internal vectors to child list

2017-12-19 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-6043:
--

 Summary: Nullable vector, but not List vector, adds its internal 
vectors to child list
 Key: DRILL-6043
 URL: https://issues.apache.org/jira/browse/DRILL-6043
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Priority: Minor


Each Drill vector has associated metadata in the form of a {{MaterializeField}} 
instance. The {{MaterializeField}} contains a list of children. For a Map 
vector, the list of children lists the vectors that make up the map.

Nullable vectors use the list of children to identify the hidden vectors that 
make up the nullable vectors: {{$bits$}} and {{$values$}}.

However, repeated vectors (including lists) also have hidden internal vectors: 
offsets and values. However, the metadata for repeated types and lists do not 
include these in the vector metadata.

We should decide if we need metadata for the implied internal vectors. (Having 
it does cause problems since a newly-created schema for a nullable vector is 
not equal to the actual schema created by the vector itself.)

If we don't need the internal vector metadata, remove it from the nullable 
vectors.

But, if we do need it, add it to the repeated vectors and to lists.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)