date:20170916

Build failed in Jenkins: drill-scm #876

2017-09-16 Thread Apache Jenkins Server

See 

Changes:

[progers] DRILL-5723: Added System Internal Options That can be Modified at

--
[...truncated 132.63 KB...]
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 4 licenses.
[INFO] 
[INFO] --- git-commit-id-plugin:2.1.9:revision (for-jars) @ drill-jdbc-all ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ drill-jdbc-all 
---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
drill-jdbc-all ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.2:compile (default-compile) @ drill-jdbc-all 
---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
drill-jdbc-all ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.2:testCompile (default-testCompile) @ 
drill-jdbc-all ---
[INFO] Compiling 2 source files to 

[INFO] 
[INFO] --- maven-surefire-plugin:2.17:test (default-test) @ drill-jdbc-all ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ drill-jdbc-all ---
[INFO] Building jar: 

[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
drill-jdbc-all ---
[INFO] 
[INFO] --- maven-jar-plugin:2.4:test-jar (default) @ drill-jdbc-all ---
[INFO] Building jar: 

[INFO] 
[INFO] --- maven-shade-plugin:2.4.1:shade (default) @ drill-jdbc-all ---
[INFO] Including org.slf4j:slf4j-api:jar:1.7.6 in the shaded jar.
[INFO] Including org.apache.drill:drill-common:jar:1.12.0-SNAPSHOT in the 
shaded jar.
[INFO] Including org.apache.drill:drill-protocol:jar:1.12.0-SNAPSHOT in the 
shaded jar.
[INFO] Excluding com.dyuproject.protostuff:protostuff-core:jar:1.0.8 from the 
shaded jar.
[INFO] Excluding com.dyuproject.protostuff:protostuff-api:jar:1.0.8 from the 
shaded jar.
[INFO] Excluding com.dyuproject.protostuff:protostuff-json:jar:1.0.8 from the 
shaded jar.
[INFO] Excluding org.apache.calcite:calcite-core:jar:1.4.0-drill-r21 from the 
shaded jar.
[INFO] Excluding org.apache.calcite:calcite-linq4j:jar:1.4.0-drill-r21 from the 
shaded jar.
[INFO] Including commons-dbcp:commons-dbcp:jar:1.4 in the shaded jar.
[INFO] Including com.google.code.findbugs:jsr305:jar:1.3.9 in the shaded jar.
[INFO] Including net.hydromatic:eigenbase-properties:jar:1.1.5 in the shaded 
jar.
[INFO] Including org.codehaus.janino:janino:jar:2.7.6 in the shaded jar.
[INFO] Including org.codehaus.janino:commons-compiler:jar:2.7.6 in the shaded 
jar.
[INFO] Excluding org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde from 
the shaded jar.
[INFO] Including com.typesafe:config:jar:1.0.0 in the shaded jar.
[INFO] Including org.apache.commons:commons-lang3:jar:3.1 in the shaded jar.
[INFO] Excluding org.msgpack:msgpack:jar:0.6.6 from the shaded jar.
[INFO] Excluding com.googlecode.json-simple:json-simple:jar:1.1.1 from the 
shaded jar.
[INFO] Including org.javassist:javassist:jar:3.16.1-GA in the shaded jar.
[INFO] Including org.reflections:reflections:jar:0.9.8 in the shaded jar.
[INFO] Excluding dom4j:dom4j:jar:1.6.1 from the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-annotations:jar:2.7.8 in 
the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-databind:jar:2.7.8 in the 
shaded jar.
[INFO] Including com.codahale.metrics:metrics-core:jar:3.0.1 in the shaded jar.
[INFO] Including com.codahale.metrics:metrics-servlets:jar:3.0.1 in the shaded 
jar.
[INFO] Including com.codahale.metrics:metrics-healthchecks:jar:3.0.1 in the 
shaded jar.
[INFO] Including com.codahale.metrics:metrics-json:jar:3.0.1 in the shaded jar.
[INFO] Including com.codahale.metrics:metrics-jvm:jar:3.0.1 in the shaded jar.
[INFO] Including org.antlr:antlr-runtime:jar:3.4 in the shaded jar.
[INFO] Including org.antlr:stringtemplate:jar:3.2.1 in the shaded jar.
[INFO] Excluding antlr:antlr:jar:2.7.7 from the shaded jar.
[INFO] Including joda-time:joda-time:jar:2.9 in the shaded jar.
[INFO] Including org.apache.drill.exec:drill-java-exec:jar:1.12.0-SNAPSHOT in 
the shaded jar.
[INFO] Excluding org.ow2.asm:asm-debug-all:jar:5.0.3 from the shaded jar.
[INFO] Including org.apache.commons:commons-pool2:jar:2.1 in the shaded jar.
[INFO] Excluding com.univocity:univocity-parsers:jar:1.3.0 from the shaded jar.
[INFO] Including org.apache.commons:commons-math:jar:2.2 in the shaded jar.
[INFO] Including

[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

2017-09-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/923


---

[GitHub] drill issue #914: DRILL-5657: Size-aware vector writer structure

2017-09-16 Thread paul-rogers

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/914
  
The second group of five commits refines the result set loader mechanism.

### Model Layer Revision

Drill is unique for a query engine in that it handles structured types as 
first-class data types. For example, Drill supports maps (called âstructsâ 
by Hive and Impala), but also supports arrays of maps. Drill support simple 
scalars, and also arrays of scalars. Put these together and we have maps that 
contain arrays of maps that contain arrays of scalars. The result is a tree 
structure described in the Drill documentation as modeled on JSON.

The prior version used a âmodelâ layer to construct internal structures 
that models the tree-structure of Drillâs data types. The model âreifiedâ 
the structure into a set of objects. While this worked well, it added more 
complexity than necessary, especially when dynamically evolving a schema or 
working out how to handle scan-time projection.

This version retains the model layer, but as a series of algorithms that 
walk the vector container structure rather than as a separate data structure. 
The model layer still provides tools to build readers for âsingleâ and 
âhyperâ vectors, to extract a metadata schema from a set of vectors, to 
create writers for a âsingleâ vector container, and so on.

Replacing the object structure with algorithms required changes to both the 
row set abstractions and the result set loader.

The key loss in this change is the set of âvisitorsâ from the previous 
revision. The reified model allowed all visitors to use a common structure. The 
new solution still has visitors, but now they are ad-hoc, walking the container 
tree in different ways depending on whether the code can work with columns 
generically, or needs to deal with individual (single) vectors or hyper vectors.

### Revised Result Set Loader Column and Vector State

To understand the need for many of the changes in this commit, it helps to 
take a step back and remember what weâre trying to do. We want to write to 
vectors, but control the resulting batch size.

 Background

Writing to vectors is easy if we deal only with flat rows and donât worry 
about batch size:

* Vectors provide `Mutator` classes that write to single vectors.
* A set of âlegacyâ vector writers are available, and are used by some 
readers.
* Generated code uses `Mutator` and `Accessor` classes to work with vectors.
* The âRow Setâ classes, used for testing, provides a refined column 
writer to populate batches.

The above are the easy parts. Some challenges include:

* None of the above limit batch memory, they only limit row count.
* Writing directly to vectors requires that the client deal with the 
complexity of tracking a common position across vectors.
* Drillâs tree structure makes everything more complex as positions must 
be tracked across multiple repetition levels (see below).

The âresult set loaderâ (along with the column writers) provides a next 
level of completeness by tackling the vector memory size problem, for the 
entire set of Drill structured types.

 Overflow and Rollover

The key trick is to handle vector âoverflowâ seamlessly shifting 
writes, mid-row, from a full batch, to a new âlook-aheadâ batch. The 
process of shifting data is called ârollover.â

To implement rollover, we need to work with two sets of vectors:

* The âactiveâ set: the vectors âunderâ the writers and returned 
downstream.
* The âbackupâ set: holds the buffers not currently in use.

During an overflow event, buffers are shifted between the active and backup 
vectors:

* On overflow, the full buffers reside in the active vectors.
* After rollover, the full buffers reside in the backup vectors, and new 
buffers, now holding the in-flight row (called the âlook-aheadâ row), 
reside in the active vectors.
* When âharvestingâ the full batch, the full buffers and look-ahead 
vectors are exchange, so the full buffers are back in the active vectors.
* When starting the next batch for writing, the look-ahead row is shifted 
from the backup vectors into the active vectors and the cycle repeats.

 Column and Vector State

When writing without overflow, we need only one vector and so the usual 
Drill vector container is sufficient. With overflow, we have two sets of 
vectors, and must perform operations on them, so we need a place to store the 
data. This is the purpose of the âcolumn stateâ and âvector stateâ 
classes.

Think of the overall result set loader structure a has having three key 
parts:

* Result set loader: manages the entire batch
* Column writers: accepts writes to vectors

[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

2017-09-16 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/914#discussion_r139296614
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/package-info.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Handles the details of the result set loader implementation.
+ * 
+ * The primary purpose of this loader, and the most complex to understand 
and
+ * maintain, is overflow handling.
+ *
+ * Detailed Use Cases
+ *
+ * Let's examine it by considering a number of
+ * use cases.
+ * 
+ * 
Rowabcdefgh
+ * 
n-2XX--
+ * n-1  
--
+ * n  X!O O 
O 
+ * 
+ * Here:
+ * 
+ * n-2, n-1, and n are rows. n is the overflow row.
+ * X indicates a value was written before overflow.
+ * Blank indicates no value was written in that row.
+ * ! indicates the value that triggered overflow.
+ * - indicates a column that did not exist prior to overflow.
+ * 
+ * Column a is written before overflow occurs, b causes overflow, and all 
other
+ * columns either are not written, or written after overflow.
+ * 
+ * The scenarios, identified by column names above, are:
+ * 
+ * a
+ * a contains values for all three rows.
+ * 
+ * Two values were written in the "main" batch, while a third was 
written to
+ * what becomes the overflow row.
+ * When overflow occurs, the last write position is at n. It must be 
moved
+ * back to n-1.
+ * Since data was written to the overflow row, it is copied to the 
look-
+ * ahead batch.
+ * The last write position in the lookahead batch is 0 (since data was
+ * copied into the 0th row.
+ * When harvesting, no empty-filling is needed.
+ * When starting the next batch, the last write position must be set 
to 0 to
+ * reflect the presence of the value for row n.
+ * 
+ * 
+ * b
+ * b contains values for all three rows. The value for row n triggers
+ * overflow.
+ * 
+ * The last write position is at n-1, which is kept for the "main"
+ * vector.
+ * A new overflow vector is created and starts empty, with the last 
write
+ * position at -1.
+ * Once created, b is immediately written to the overflow vector, 
advancing
+ * the last write position to 0.
+ * Harvesting, and starting the next for column b works the same as 
column
+ * a.
+ * 
+ * 
+ * c
+ * Column c has values for all rows.
+ * 
+ * The value for row n is written after overflow.
+ * At overflow, the last write position is at n-1.
+ * At overflow, a new lookahead vector is created with the last write
+ * position at -1.
+ * The value of c is written to the lookahead vector, advancing the 
last
+ * write position to -1.
+ * Harvesting, and starting the next for column c works the same as 
column
+ * a.
+ * 
+ * 
+ * d
+ * Column d writes values to the last two rows before overflow, but 
not to
+ * the overflow row.
+ * 
+ * The last write position for the main batch is at n-1.
+ * The last write position in the lookahead batch remains at -1.
+ * Harvesting for column d requires filling an empty value for row 
n-1.
+ * When starting the next batch, the last write position must be set 
to -1,
+ * indicating no data yet written.
+ * 
+ * 
+ * f
+ * Column f has no data in the last position of the main batch, and no 
data
+ * in the overflow row.
+ * 
+ * The last write position is at n-2.
+ * An empty value must be written into position n-1 during 
harvest.
+ * On start of the next batch, the last write position starts at 
-1.
+ * 
+ * 
+ * g
+ * Column g is added after overflow, and has a value written to the 
overflow
+ * row.
+ * 
+ * On harvest, column g is simply skipped.
+ * On start of the next row, the last write position can be left 
unchanged
+ * since no "exchange" was done.
+ * 
+ * 
+

[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

2017-09-16 Thread paul-rogers

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/914#discussion_r139296587
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/package-info.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Handles the details of the result set loader implementation.
+ * 
+ * The primary purpose of this loader, and the most complex to understand 
and
+ * maintain, is overflow handling.
+ *
+ * Detailed Use Cases
+ *
+ * Let's examine it by considering a number of
+ * use cases.
+ * 
+ * 
Rowabcdefgh
+ * 
n-2XX--
+ * n-1  
--
+ * n  X!O O 
O 
+ * 
+ * Here:
+ * 
+ * n-2, n-1, and n are rows. n is the overflow row.
+ * X indicates a value was written before overflow.
+ * Blank indicates no value was written in that row.
+ * ! indicates the value that triggered overflow.
+ * - indicates a column that did not exist prior to overflow.
--- End diff --

Values added after overflow.
Fixed in latest commit.


---

Build failed in Jenkins: drill-scm #876

[GitHub] drill pull request #923: DRILL-5723: Added System Internal Options That can ...

[GitHub] drill issue #914: DRILL-5657: Size-aware vector writer structure

[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

5 matches

Site Navigation

Mail list logo

Footer information