Build failed in Jenkins: drill-scm #876

2017-09-16 Thread Apache Jenkins Server
See 

Changes:

[progers] DRILL-5723: Added System Internal Options That can be Modified at

--
[...truncated 132.63 KB...]
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 4 licenses.
[INFO] 
[INFO] --- git-commit-id-plugin:2.1.9:revision (for-jars) @ drill-jdbc-all ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ drill-jdbc-all 
---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
drill-jdbc-all ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.2:compile (default-compile) @ drill-jdbc-all 
---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
drill-jdbc-all ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.2:testCompile (default-testCompile) @ 
drill-jdbc-all ---
[INFO] Compiling 2 source files to 

[INFO] 
[INFO] --- maven-surefire-plugin:2.17:test (default-test) @ drill-jdbc-all ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ drill-jdbc-all ---
[INFO] Building jar: 

[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
drill-jdbc-all ---
[INFO] 
[INFO] --- maven-jar-plugin:2.4:test-jar (default) @ drill-jdbc-all ---
[INFO] Building jar: 

[INFO] 
[INFO] --- maven-shade-plugin:2.4.1:shade (default) @ drill-jdbc-all ---
[INFO] Including org.slf4j:slf4j-api:jar:1.7.6 in the shaded jar.
[INFO] Including org.apache.drill:drill-common:jar:1.12.0-SNAPSHOT in the 
shaded jar.
[INFO] Including org.apache.drill:drill-protocol:jar:1.12.0-SNAPSHOT in the 
shaded jar.
[INFO] Excluding com.dyuproject.protostuff:protostuff-core:jar:1.0.8 from the 
shaded jar.
[INFO] Excluding com.dyuproject.protostuff:protostuff-api:jar:1.0.8 from the 
shaded jar.
[INFO] Excluding com.dyuproject.protostuff:protostuff-json:jar:1.0.8 from the 
shaded jar.
[INFO] Excluding org.apache.calcite:calcite-core:jar:1.4.0-drill-r21 from the 
shaded jar.
[INFO] Excluding org.apache.calcite:calcite-linq4j:jar:1.4.0-drill-r21 from the 
shaded jar.
[INFO] Including commons-dbcp:commons-dbcp:jar:1.4 in the shaded jar.
[INFO] Including com.google.code.findbugs:jsr305:jar:1.3.9 in the shaded jar.
[INFO] Including net.hydromatic:eigenbase-properties:jar:1.1.5 in the shaded 
jar.
[INFO] Including org.codehaus.janino:janino:jar:2.7.6 in the shaded jar.
[INFO] Including org.codehaus.janino:commons-compiler:jar:2.7.6 in the shaded 
jar.
[INFO] Excluding org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde from 
the shaded jar.
[INFO] Including com.typesafe:config:jar:1.0.0 in the shaded jar.
[INFO] Including org.apache.commons:commons-lang3:jar:3.1 in the shaded jar.
[INFO] Excluding org.msgpack:msgpack:jar:0.6.6 from the shaded jar.
[INFO] Excluding com.googlecode.json-simple:json-simple:jar:1.1.1 from the 
shaded jar.
[INFO] Including org.javassist:javassist:jar:3.16.1-GA in the shaded jar.
[INFO] Including org.reflections:reflections:jar:0.9.8 in the shaded jar.
[INFO] Excluding dom4j:dom4j:jar:1.6.1 from the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8 in 
the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-databind:jar:2.7.8 in the 
shaded jar.
[INFO] Including com.codahale.metrics:metrics-core:jar:3.0.1 in the shaded jar.
[INFO] Including com.codahale.metrics:metrics-servlets:jar:3.0.1 in the shaded 
jar.
[INFO] Including com.codahale.metrics:metrics-healthchecks:jar:3.0.1 in the 
shaded jar.
[INFO] Including com.codahale.metrics:metrics-json:jar:3.0.1 in the shaded jar.
[INFO] Including com.codahale.metrics:metrics-jvm:jar:3.0.1 in the shaded jar.
[INFO] Including org.antlr:antlr-runtime:jar:3.4 in the shaded jar.
[INFO] Including org.antlr:stringtemplate:jar:3.2.1 in the shaded jar.
[INFO] Excluding antlr:antlr:jar:2.7.7 from the shaded jar.
[INFO] Including joda-time:joda-time:jar:2.9 in the shaded jar.
[INFO] Including org.apache.drill.exec:drill-java-exec:jar:1.12.0-SNAPSHOT in 
the shaded jar.
[INFO] Excluding org.ow2.asm:asm-debug-all:jar:5.0.3 from the shaded jar.
[INFO] Including org.apache.commons:commons-pool2:jar:2.1 in the shaded jar.
[INFO] Excluding com.univocity:univocity-parsers:jar:1.3.0 from the shaded jar.
[INFO] Including org.apache.commons:commons-math:jar:2.2 in the shaded jar.
[INFO] Including 

[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/923


---


[GitHub] drill issue #914: DRILL-5657: Size-aware vector writer structure

2017-09-16 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/914
  
The second group of five commits refines the result set loader mechanism.

### Model Layer Revision

Drill is unique for a query engine in that it handles structured types as 
first-class data types. For example, Drill supports maps (called “structs” 
by Hive and Impala), but also supports arrays of maps. Drill support simple 
scalars, and also arrays of scalars. Put these together and we have maps that 
contain arrays of maps that contain arrays of scalars. The result is a tree 
structure described in the Drill documentation as modeled on JSON.

The prior version used a “model” layer to construct internal structures 
that models the tree-structure of Drill’s data types. The model “reified” 
the structure into a set of objects. While this worked well, it added more 
complexity than necessary, especially when dynamically evolving a schema or 
working out how to handle scan-time projection.

This version retains the model layer, but as a series of algorithms that 
walk the vector container structure rather than as a separate data structure. 
The model layer still provides tools to build readers for “single” and 
“hyper” vectors, to extract a metadata schema from a set of vectors, to 
create writers for a “single” vector container, and so on.

Replacing the object structure with algorithms required changes to both the 
row set abstractions and the result set loader.

The key loss in this change is the set of “visitors” from the previous 
revision. The reified model allowed all visitors to use a common structure. The 
new solution still has visitors, but now they are ad-hoc, walking the container 
tree in different ways depending on whether the code can work with columns 
generically, or needs to deal with individual (single) vectors or hyper vectors.

### Revised Result Set Loader Column and Vector State

To understand the need for many of the changes in this commit, it helps to 
take a step back and remember what we’re trying to do. We want to write to 
vectors, but control the resulting batch size.

 Background

Writing to vectors is easy if we deal only with flat rows and don’t worry 
about batch size:

* Vectors provide `Mutator` classes that write to single vectors.
* A set of “legacy” vector writers are available, and are used by some 
readers.
* Generated code uses `Mutator` and `Accessor` classes to work with vectors.
* The “Row Set” classes, used for testing, provides a refined column 
writer to populate batches.

The above are the easy parts. Some challenges include:

* None of the above limit batch memory, they only limit row count.
* Writing directly to vectors requires that the client deal with the 
complexity of tracking a common position across vectors.
* Drill’s tree structure makes everything more complex as positions must 
be tracked across multiple repetition levels (see below).

The “result set loader” (along with the column writers) provides a next 
level of completeness by tackling the vector memory size problem, for the 
entire set of Drill structured types.

 Overflow and Rollover

The key trick is to handle vector “overflow” seamlessly shifting 
writes, mid-row, from a full batch, to a new “look-ahead” batch. The 
process of shifting data is called “rollover.”

To implement rollover, we need to work with two sets of vectors:

* The “active” set: the vectors “under” the writers and returned 
downstream.
* The “backup” set: holds the buffers not currently in use.

During an overflow event, buffers are shifted between the active and backup 
vectors:

* On overflow, the full buffers reside in the active vectors.
* After rollover, the full buffers reside in the backup vectors, and new 
buffers, now holding the in-flight row (called the “look-ahead” row), 
reside in the active vectors.
* When “harvesting” the full batch, the full buffers and look-ahead 
vectors are exchange, so the full buffers are back in the active vectors.
* When starting the next batch for writing, the look-ahead row is shifted 
from the backup vectors into the active vectors and the cycle repeats.

 Column and Vector State

When writing without overflow, we need only one vector and so the usual 
Drill vector container is sufficient. With overflow, we have two sets of 
vectors, and must perform operations on them, so we need a place to store the 
data. This is the purpose of the “column state” and “vector state” 
classes.

Think of the overall result set loader structure a has having three key 
parts:

* Result set loader: manages the entire batch
* Column writers: accepts writes to vectors

[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

2017-09-16 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/914#discussion_r139296614
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/package-info.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Handles the details of the result set loader implementation.
+ * 
+ * The primary purpose of this loader, and the most complex to understand 
and
+ * maintain, is overflow handling.
+ *
+ * Detailed Use Cases
+ *
+ * Let's examine it by considering a number of
+ * use cases.
+ * 
+ * 
Rowabcdefgh
+ * 
n-2XX--
+ * n-1  
--
+ * n  X!O O 
O 
+ * 
+ * Here:
+ * 
+ * n-2, n-1, and n are rows. n is the overflow row.
+ * X indicates a value was written before overflow.
+ * Blank indicates no value was written in that row.
+ * ! indicates the value that triggered overflow.
+ * - indicates a column that did not exist prior to overflow.
+ * 
+ * Column a is written before overflow occurs, b causes overflow, and all 
other
+ * columns either are not written, or written after overflow.
+ * 
+ * The scenarios, identified by column names above, are:
+ * 
+ * a
+ * a contains values for all three rows.
+ * 
+ * Two values were written in the "main" batch, while a third was 
written to
+ * what becomes the overflow row.
+ * When overflow occurs, the last write position is at n. It must be 
moved
+ * back to n-1.
+ * Since data was written to the overflow row, it is copied to the 
look-
+ * ahead batch.
+ * The last write position in the lookahead batch is 0 (since data was
+ * copied into the 0th row.
+ * When harvesting, no empty-filling is needed.
+ * When starting the next batch, the last write position must be set 
to 0 to
+ * reflect the presence of the value for row n.
+ * 
+ * 
+ * b
+ * b contains values for all three rows. The value for row n triggers
+ * overflow.
+ * 
+ * The last write position is at n-1, which is kept for the "main"
+ * vector.
+ * A new overflow vector is created and starts empty, with the last 
write
+ * position at -1.
+ * Once created, b is immediately written to the overflow vector, 
advancing
+ * the last write position to 0.
+ * Harvesting, and starting the next for column b works the same as 
column
+ * a.
+ * 
+ * 
+ * c
+ * Column c has values for all rows.
+ * 
+ * The value for row n is written after overflow.
+ * At overflow, the last write position is at n-1.
+ * At overflow, a new lookahead vector is created with the last write
+ * position at -1.
+ * The value of c is written to the lookahead vector, advancing the 
last
+ * write position to -1.
+ * Harvesting, and starting the next for column c works the same as 
column
+ * a.
+ * 
+ * 
+ * d
+ * Column d writes values to the last two rows before overflow, but 
not to
+ * the overflow row.
+ * 
+ * The last write position for the main batch is at n-1.
+ * The last write position in the lookahead batch remains at -1.
+ * Harvesting for column d requires filling an empty value for row 
n-1.
+ * When starting the next batch, the last write position must be set 
to -1,
+ * indicating no data yet written.
+ * 
+ * 
+ * f
+ * Column f has no data in the last position of the main batch, and no 
data
+ * in the overflow row.
+ * 
+ * The last write position is at n-2.
+ * An empty value must be written into position n-1 during 
harvest.
+ * On start of the next batch, the last write position starts at 
-1.
+ * 
+ * 
+ * g
+ * Column g is added after overflow, and has a value written to the 
overflow
+ * row.
+ * 
+ * On harvest, column g is simply skipped.
+ * On start of the next row, the last write position can be left 
unchanged
+ * since no "exchange" was done.
+ * 
+ * 
+ 

[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

2017-09-16 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/914#discussion_r139296587
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/package-info.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Handles the details of the result set loader implementation.
+ * 
+ * The primary purpose of this loader, and the most complex to understand 
and
+ * maintain, is overflow handling.
+ *
+ * Detailed Use Cases
+ *
+ * Let's examine it by considering a number of
+ * use cases.
+ * 
+ * 
Rowabcdefgh
+ * 
n-2XX--
+ * n-1  
--
+ * n  X!O O 
O 
+ * 
+ * Here:
+ * 
+ * n-2, n-1, and n are rows. n is the overflow row.
+ * X indicates a value was written before overflow.
+ * Blank indicates no value was written in that row.
+ * ! indicates the value that triggered overflow.
+ * - indicates a column that did not exist prior to overflow.
--- End diff --

Values added after overflow.
Fixed in latest commit.


---