[GitHub] drill pull request #1162: DRILL-6005: Fixed TestGracefulShutdown tests to sk...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1162#discussion_r174035415 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -640,4 +640,6 @@ public static String bootDefaultFor(String name) { public static final String DRILL_PORT_HUNT = "drill.exec.port_hunt"; + public static final String ALLOW_LOOPBACK_ADDRESS_BINDING = "drill.exec.enable_loopback_address_binding"; --- End diff -- Please rename to `drill.exec.allow_loopback_address_binding`. ---
[jira] [Resolved] (DRILL-4333) tests in Drill2489CallsAfterCloseThrowExceptionsTest fail in Java 8
[ https://issues.apache.org/jira/browse/DRILL-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-4333. - Resolution: Fixed Fix Version/s: (was: Future) 1.13.0 > tests in Drill2489CallsAfterCloseThrowExceptionsTest fail in Java 8 > --- > > Key: DRILL-4333 > URL: https://issues.apache.org/jira/browse/DRILL-4333 > Project: Apache Drill > Issue Type: Sub-task > Components: Tools, Build Test >Affects Versions: 1.5.0 >Reporter: Deneche A. Hakim >Priority: Major > Fix For: 1.13.0 > > > The following tests fail in Java 8: > {noformat} > Drill2489CallsAfterCloseThrowExceptionsTest.testClosedPlainStatementMethodsThrowRight > Drill2489CallsAfterCloseThrowExceptionsTest.testclosedPreparedStmtOfOpenConnMethodsThrowRight > Drill2489CallsAfterCloseThrowExceptionsTest.testClosedResultSetMethodsThrowRight1 > Drill2489CallsAfterCloseThrowExceptionsTest.testClosedResultSetMethodsThrowRight2 > Drill2489CallsAfterCloseThrowExceptionsTest.testClosedDatabaseMetaDataMethodsThrowRight > Drill2769UnsupportedReportsUseSqlExceptionTest.testPreparedStatementMethodsThrowRight > Drill2769UnsupportedReportsUseSqlExceptionTest.testPlainStatementMethodsThrowRight > {noformat} > Drill has special implementations of Statement, PreparedStatement, ResultSet > and DatabaseMetadata that overrides all parent methods to make sure they > throw a proper exception if the statement has already been closed. > These tests use reflection to make sure all methods behave correctly, but > Java 8 added more methods that need to be properly overridden. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[VOTE] Apache Drill release 1.13.0 - RC0
Hi all, I'd like to propose the first release candidate (RC0) of Apache Drill, version 1.13.0. The release candidate covers a total of 113 resolved JIRAs [1]. Thanks to everyone who contributed to this release. The tarball artifacts are hosted at [2] and the maven artifacts are hosted at [3]. This release candidate is based on commit cac2882d5a9e22fbc251e4caf622fe30242ad557 located at [4]. Please download and try out the release. The vote ends at 1:00 PM UTC (5:00 AM PDT, 2:00 PM EET, 5:30 PM IST), Mar 16th, 2018 [ ] +1 [ ] +0 [ ] -1 Here's my vote: +1 [1 ] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12342096 [2] http://home.apache.org/~parthc/drill/releases/1.13.0/rc0/ [3] https://repository.apache.org/content/repositories/orgapachedrill-1046 [4] https://github.com/parthchandra/drill/tree/drill-1.13.0
[jira] [Resolved] (DRILL-6163) Switch Travis To Java 8
[ https://issues.apache.org/jira/browse/DRILL-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-6163. - Resolution: Fixed Fixed in dbc95a659c3325e263340f3ec2b913a048163671. > Switch Travis To Java 8 > --- > > Key: DRILL-6163 > URL: https://issues.apache.org/jira/browse/DRILL-6163 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Timothy Farkas >Assignee: Volodymyr Tkach >Priority: Major > Fix For: 1.13.0 > > > Drill is preparing to move to Java 8 for the next release. So we should make > Travis test with Java 8 as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-4329) 13 Unit tests are failing with JDK 8
[ https://issues.apache.org/jira/browse/DRILL-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-4329. - Resolution: Fixed Fix Version/s: (was: Future) 1.13.0 > 13 Unit tests are failing with JDK 8 > > > Key: DRILL-4329 > URL: https://issues.apache.org/jira/browse/DRILL-4329 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build Test > Environment: Mac OS > JDK 1.8.0_65 >Reporter: Deneche A. Hakim >Priority: Major > Fix For: 1.13.0 > > > The following unit tests are failing when building Drill with JDK 1.8.0_65: > {noformat} > TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject > TestFrameworkTest { > testRepeatedColumnMatching > testCSVVerificationOfOrder_checkFailure > } > Drill2489CallsAfterCloseThrowExceptionsTest { > testClosedDatabaseMetaDataMethodsThrowRight > testClosedPlainStatementMethodsThrowRight > testclosedPreparedStmtOfOpenConnMethodsThrowRight > testClosedResultSetMethodsThrowRight1 > testClosedResultSetMethodsThrowRight2 > } > Drill2769UnsupportedReportsUseSqlExceptionTest { > testPreparedStatementMethodsThrowRight > testPlainStatementMethodsThrowRight > } > TestMongoFilterPushDown { > testFilterPushDownIsEqual > testFilterPushDownGreaterThanWithSingleField > testFilterPushDownLessThanWithSingleField > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] drill pull request #1158: DRILL-6145: Implement Hive MapR-DB JSON handler
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1158#discussion_r174125821 --- Diff: distribution/pom.xml --- @@ -324,6 +324,14 @@ org.apache.hbase hbase-protocol + --- End diff -- I did it at first time, but jars are downloaded usually only for dependencies from common part of pom file. Looks like to add these dependencies into mar profile of drill-hive module, it is necessary to add maven-assembly-plugin to mapr profile of storage-hive module or to edit distribution pom file somehow to leverage dependencies from profiles section. I will investigate it. ---
[jira] [Created] (DRILL-6238) Batch sizing for operators
Padma Penumarthy created DRILL-6238: --- Summary: Batch sizing for operators Key: DRILL-6238 URL: https://issues.apache.org/jira/browse/DRILL-6238 Project: Apache Drill Issue Type: New Feature Reporter: Padma Penumarthy Assignee: Padma Penumarthy *Batch Sizing For Operators* This document describes the approach we are taking for limiting batch sizes for operators other than scan. *Motivation* Main goals are # Improve concurrency # Reduce query failures because of out of memory errors To accomplish these goals, we need to make queries execute within a specified memory budget. To enforce per query memory limit, we need to be able to enforce per fragment and per operator memory limits. Controlling individual operators batch sizes is the first step towards all this. *Background* In Drill, different operators have different limits w.r.to outgoing batches. Some use hard coded row counts, some use hard coded memory and some have none at all. Based on input data size and what the operator is doing, memory used by the outgoing batch can vary widely as there are no limits imposed. Queries fail because we are not able to allocate the memory needed. Some operators produce very large batches, causing blocking operators like sort, hash agg which have to work under tight memory constraints to fail. Size of batches should be a function of available memory rather than input data size and/or what the operator does. Please refer to table at the end of this document for details on what each operator does today. *Design* Goal is to have all operators behave the same way i.e. produce batches with size less than or equal to configured outgoing batch size with a minimum of 1 row per batch and maximum of 64k rows per batch. A new system option ‘drill.exec.memory.operator.output_batch_size’ is added which has default value of 16MB. The basic idea is to limit size of outgoing batch by deciding how many rows we can have in the batch based on average entry size of each outgoing column, taking into account actual data size and metadata vector overhead we add on top for tracking variable length, mode(repeated, optional, required) etc. This calculation will be different for each operator, based on what the operator is doing, incoming data and what is being projected out. By taking this adaptive approach based on actual data sizes, for operators which were limiting batch size to something less than 64K before can possibly do lot more rows (upto 64K) in a batch if the memory stays within the budget. This should help improve performance. Also, to improve performance and utilize memory more efficiently, we will # Allocate memory for value vectors upfront. Since we know the number of rows and sizing information for each column in the outgoing batch, we will use that information to allocate memory for value vectors upfront. This will help improve performance by reducing the memory copies and zeroing the new half we do every time we double. # Make the number of rows in outgoing batch a power of two. Since memory is allocated in powers of two, this will help us pack the value vectors densely thereby reducing the amount of memory that gets wasted because of doubling effect. So, to summarize, the benefits we will get are improved memory utilization, better performance, higher concurrency and less queries dying because of out of memory errors. So, what are the cons ? * Since this is based on averages, strict enforcement is not possible. There could be pathological cases where because of uneven data distribution, we might exceed the configured output batch size potentially causing OOM errors and problems in downstream operators. Other issues that will be addressed: * We are adding extra processing for each batch in each operator to figure out the sizing information. This overhead can be reduced by passing this information along with the batch between operators. * For some operators, it will be complex to figure out average size of outgoing columns especially if we have to evaluate complex expression trees and UDFs to figure out the transformation on incoming batches. We will use approximations as appropriate. Following table summarizes the limits we have today for each operator. flatten, merge join and external sort have already been changed to adhere to batch size limits as described in this document as of drill release 1.13. |*Operator*|*Limit* *(Rows, Memory)*|*Notes*| |Flatten|4K, 512MB|Flatten can produce very large batches based on average cardinality of the flatten column. | |Merge Receiver|32K|No memory limit. | |Hash Aggregate|64K|No memory limit.| |Streaming Aggregate|32K|No memory limit.| |Broadcast Sender|None|No limits. | |Filter, Limit|None|No limits.| |Hash Join|4K|No memory limit| |Merge Join|4K|No memory limit| |Nested
batch sizing for operators
I have written a small design document explaining the batch sizing work we are doing for operators (other than scan). https://issues.apache.org/jira/browse/DRILL-6238 Some of this work has already been done in 1.13. flatten, merge join and external sort are changed to adhere to batch size limits. The work will continue in future releases. I would like to get feedback on the approach being used. All comments are welcome. Please update the JIRA or respond to this email with your comments. Thanks Padma
[GitHub] drill pull request #1163: DRILL-6053 & DRILL-6237
GitHub user vrozov opened a pull request: https://github.com/apache/drill/pull/1163 DRILL-6053 & DRILL-6237 - Avoid excessive locking in LocalPersistentStore - Upgrade checkstyle version to 5.9 or above @arina-ielchiieva Please review You can merge this pull request into a Git repository by running: $ git pull https://github.com/vrozov/drill DRILL-6237 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1163.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1163 commit adc5aea6ef217c3a3548e630b466f9b56adfb282 Author: Vlad RozovDate: 2018-03-13T17:56:52Z DRILL-6053: Avoid excessive locking in LocalPersistentStore commit c1def24c62ccb729e720b5416abdb41c47a4869f Author: Vlad Rozov Date: 2018-03-13T19:24:48Z DRILL-6237: Upgrade checkstyle version to 5.9 or above commit 6bde4215e8eb2bc19a1e981c3e444b43b08237ee Author: Vlad Rozov Date: 2018-03-13T20:48:35Z DRILL-6053: Avoid excessive locking in LocalPersistentStore ---
[GitHub] drill pull request #1158: DRILL-6145: Implement Hive MapR-DB JSON handler
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1158#discussion_r174285398 --- Diff: distribution/pom.xml --- @@ -324,6 +324,14 @@ org.apache.hbase hbase-protocol + --- End diff -- maven dependency handling is indeed somewhat inconsistent in regards to profile. Check maven 3.5 or above. AFAIK it (maven reactor) handles profile dependencies better compared to prior versions. ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174348633 --- Diff: exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.memory.RootAllocator; +import org.apache.drill.exec.record.MaterializedField; +import org.junit.Assert; +import org.junit.Test; + +/** + * This test uses {@link VarCharVector} to test the template code in VariableLengthVector. + */ +public class VariableLengthVectorTest +{ + /** + * If the vector contains 1000 records, setting a value count of 1000 should work. + */ + @Test + public void testSettingSameValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +checkIndexStrings("", 0, size, accessor); + } finally { +vector.clear(); + } +} + } + + /** + * Test trunicating data. If you have 1 records, reduce the vector to 1000 records. + */ + @Test + public void testTrunicateVectorSetValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final int fluffSize = 1; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); +setSafeIndexStrings("first cut ", size, fluffSize, mutator); + +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +setSafeIndexStrings("redone cut ", size, fluffSize, mutator); +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); + +checkIndexStrings("", 0, size, accessor); + + } finally { +vector.clear(); + } +} + } + + /** + * Set 1 values. Then go back and set new values starting at the 1001 the record. --- End diff -- I agree the vector writers should be used. The reason why I was looking into this is that I saw strange behavior in the legacy HashTable where setValueCount was being called with a larger valueCount than there was data in a VarCharVector. I did an ugly (and now I think incorrect work around) for the issue and set about to make setValueCount support setting a valueCount larger than the number elements in the VarCharVector. Now I am realizing the issue is with the HashTableTemplate and I need to look into why it is behaving incorrectly. Your review has been very
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174349801 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last --- End diff -- Updated ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174349826 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last + * set index is corrupted. --- End diff -- Added ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174343151 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. --- End diff -- Thanks for bringing this up. I'm sharing a design doc on the dev list tomorrow or the day after about how I plan to refactor HashAgg. It will cover how to facilitate unit tests and how to change the memory handling to use a deterministic calculator like the SortMemoryManager and soon to be introduced HashJoinMemoryCalculator (instead of catch OOMs). Perhaps you could comment on the doc about how to set ourselves up to fix this case. ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174349707 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -506,6 +506,8 @@ public boolean isNull(int index){ } /** + * Overview --- End diff -- Fixed ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174350127 --- Diff: exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.memory.RootAllocator; +import org.apache.drill.exec.record.MaterializedField; +import org.junit.Assert; +import org.junit.Test; + +/** + * This test uses {@link VarCharVector} to test the template code in VariableLengthVector. + */ +public class VariableLengthVectorTest +{ + /** + * If the vector contains 1000 records, setting a value count of 1000 should work. + */ + @Test + public void testSettingSameValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +checkIndexStrings("", 0, size, accessor); + } finally { +vector.clear(); + } +} + } + + /** + * Test trunicating data. If you have 1 records, reduce the vector to 1000 records. + */ + @Test + public void testTrunicateVectorSetValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final int fluffSize = 1; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); +setSafeIndexStrings("first cut ", size, fluffSize, mutator); + +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +setSafeIndexStrings("redone cut ", size, fluffSize, mutator); --- End diff -- Yikes! I didn't know this. Thanks for catching. ---
[GitHub] drill issue #1164: DRILL-6234: Improved documentation for VariableWidthVecto...
Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1164 @paul-rogers Applied review comments. ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174349888 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last + * set index is corrupted. + * + * Notes + * + * There is no gaurantee the data buffer for the {@link VariableWidthVector} will have enough space to contain the data you set unless you use setSafe. If you + * use set you may get array index out of bounds exceptions. --- End diff -- Liked this refactored phrasing ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174349986 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract --- End diff -- This is a good way to layout the information. I switched the javadoc to follow this outline. ---
[jira] [Created] (DRILL-6240) Operator Overview should show estimated row counts
Kunal Khatua created DRILL-6240: --- Summary: Operator Overview should show estimated row counts Key: DRILL-6240 URL: https://issues.apache.org/jira/browse/DRILL-6240 Project: Apache Drill Issue Type: Improvement Components: Web Server Affects Versions: 1.12.0 Reporter: Kunal Khatua Fix For: 1.14.0 Operator Profile Overview should show comparison between estimated row counts and actual rowcounts -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6237) Upgrade checkstyle version to 5.9 or above
Vlad Rozov created DRILL-6237: - Summary: Upgrade checkstyle version to 5.9 or above Key: DRILL-6237 URL: https://issues.apache.org/jira/browse/DRILL-6237 Project: Apache Drill Issue Type: Task Reporter: Vlad Rozov Assignee: Vlad Rozov Checkstyle versions prior to 5.9 do not support Java 8 syntax. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6239) Add Build and License Badges to README.md
Timothy Farkas created DRILL-6239: - Summary: Add Build and License Badges to README.md Key: DRILL-6239 URL: https://issues.apache.org/jira/browse/DRILL-6239 Project: Apache Drill Issue Type: Improvement Reporter: Timothy Farkas Assignee: Timothy Farkas Other projects have pretty badges showing the build status and license on the README.md page. We should have it too! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] drill pull request #1165: DRILL-6239: Add build and license badges to README...
GitHub user ilooner opened a pull request: https://github.com/apache/drill/pull/1165 DRILL-6239: Add build and license badges to README.md Add nice build and license badges that everyone else has. See a preview of what they look like here: https://github.com/ilooner/drill/tree/DRILL-6239 You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilooner/drill DRILL-6239 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1165.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1165 commit c95030076a884b074821e169c4e58c84d89275c8 Author: Timothy FarkasDate: 2018-03-14T00:39:50Z DRILL-6239: Add build and license badges to README.md ---
[GitHub] drill issue #1165: DRILL-6239: Add build and license badges to README.md
Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1165 @arina-ielchiieva ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174328662 --- Diff: exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.memory.RootAllocator; +import org.apache.drill.exec.record.MaterializedField; +import org.junit.Assert; +import org.junit.Test; + +/** + * This test uses {@link VarCharVector} to test the template code in VariableLengthVector. + */ +public class VariableLengthVectorTest +{ + /** + * If the vector contains 1000 records, setting a value count of 1000 should work. + */ + @Test + public void testSettingSameValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +checkIndexStrings("", 0, size, accessor); + } finally { +vector.clear(); + } +} + } + + /** + * Test trunicating data. If you have 1 records, reduce the vector to 1000 records. + */ + @Test + public void testTrunicateVectorSetValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final int fluffSize = 1; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); +setSafeIndexStrings("first cut ", size, fluffSize, mutator); + +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +setSafeIndexStrings("redone cut ", size, fluffSize, mutator); +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); + +checkIndexStrings("", 0, size, accessor); + + } finally { +vector.clear(); + } +} + } + + /** + * Set 1 values. Then go back and set new values starting at the 1001 the record. --- End diff -- Just FYI: the vector writers handle all this stuff for you. They allow overwriting the most recent value. They keep track of vector counts and data offsets. And so on. This is why I can offer such detailed comments: I learned how all this works when creating those classes. Would be very wonderful to start reusing that work rather than creating multiple implementations of the same thing... (Here, however, I recognize that the random access pattern wanted in your project does not match the sequential approach taken by the writers. But, as you are seeing, we can't really
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174327601 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last + * set index is corrupted. + * + * Notes + * + * There is no gaurantee the data buffer for the {@link VariableWidthVector} will have enough space to contain the data you set unless you use setSafe. If you + * use set you may get array index out of bounds exceptions. --- End diff -- Said another way, either 1) be careful to manage your own memory, or 2) call `setSafe()`. That is, in fact, why `setSafe()` exists. ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174328361 --- Diff: exec/vector/src/test/java/org/apache/drill/exec/vector/VariableLengthVectorTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.memory.RootAllocator; +import org.apache.drill.exec.record.MaterializedField; +import org.junit.Assert; +import org.junit.Test; + +/** + * This test uses {@link VarCharVector} to test the template code in VariableLengthVector. + */ +public class VariableLengthVectorTest +{ + /** + * If the vector contains 1000 records, setting a value count of 1000 should work. + */ + @Test + public void testSettingSameValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +checkIndexStrings("", 0, size, accessor); + } finally { +vector.clear(); + } +} + } + + /** + * Test trunicating data. If you have 1 records, reduce the vector to 1000 records. + */ + @Test + public void testTrunicateVectorSetValueCount() + { +try (RootAllocator allocator = new RootAllocator(10_000_000)) { + final MaterializedField field = MaterializedField.create("stringCol", Types.required(TypeProtos.MinorType.VARCHAR)); + final VarCharVector vector = new VarCharVector(field, allocator); + + vector.allocateNew(); + + try { +final int size = 1000; +final int fluffSize = 1; +final VarCharVector.Mutator mutator = vector.getMutator(); +final VarCharVector.Accessor accessor = vector.getAccessor(); + +setSafeIndexStrings("", 0, size, mutator); +setSafeIndexStrings("first cut ", size, fluffSize, mutator); + +mutator.setValueCount(fluffSize); +Assert.assertEquals(fluffSize, accessor.getValueCount()); + +mutator.setValueCount(size); +Assert.assertEquals(size, accessor.getValueCount()); +setSafeIndexStrings("redone cut ", size, fluffSize, mutator); --- End diff -- While this works, we are actually violating the vector contract which is "once the value count is set, the vector becomes immutable." If the client is not done writing to the vector, the client should maintain the value count until it is finally done. ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174327511 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. --- End diff -- This is very important to know. This is why spill-to-disk for hash agg will eventually cause a serious customer failure. Aggregate UDFs write to vectors to store intermediate group values. A "max" string can't. Instead, it writes to a Java object. That object will be lost on spill and reread. Will result in loosing prior max values and the aggregate starting over. So, this little note is not just a nuisance, it is the fatal flaw in how we handle the (albeit obscure) string aggregate values. ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174327132 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last --- End diff -- "all data container after" --> "all data after the updated index" ? ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174327182 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract + * + * Variable length vectors do not support random writes. All set methods must be called for with a monotonically increasing consecutive sequence of indexes. + * It is possible to trim the vector by setting the value count to be less than the number of values currently container in the vector with {@link #setValueCount(int)}, then + * the process of setting values starts with the index after the last index. + * + * + * It is also possible to back track and set the value at an index earlier than the current index, however, the caller must assume that all data container after the last + * set index is corrupted. --- End diff -- Maybe add "changing the index does not release memory after the index." ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174327027 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -506,6 +506,8 @@ public boolean isNull(int index){ } /** + * Overview --- End diff -- Nit, but I think that h4 is the usual heading level used in Java doc comments. The higher levels are used in the surrounding generated HTML. ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1164#discussion_r174328025 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -514,6 +516,22 @@ public boolean isNull(int index){ * The equivalent Java primitive is '${minor.javaType!type.javaType}' * * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + * + * Contract --- End diff -- Might be worth summarizing how to use this: 1) Write to values sequentially. Fixed-width vectors allow random access, but special care is needed. 2) Keep track in client code of the total time count. Call `setValueCount()` once the vector is full to set the final value. (The vector does not know its count while a write is in progress.) 3) Either take responsibility for allocating enough memory, or call the `setSafe()` methods to automatically extend the vector. 4) Once vectors are written, they are immutable; no additional writes of any kind are allowed to that vector. ---
[GitHub] drill issue #1164: DRILL-6234: Improved documentation for VariableWidthVecto...
Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1164 @paul-rogers ---
[GitHub] drill pull request #1164: DRILL-6234: Improved documentation for VariableWid...
GitHub user ilooner opened a pull request: https://github.com/apache/drill/pull/1164 DRILL-6234: Improved documentation for VariableWidthVector mutators I had some confusion about how setValueCount should behave for variable width vectors. I added some documentation and unit tests which explain its behavior so that others don't waste time in the future. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilooner/drill DRILL-6234 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1164.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1164 commit 4f13fd2873a9b510a9d0105ad87f72792aa46314 Author: Timothy FarkasDate: 2018-03-14T00:24:28Z DRILL-6234: Improved documentation for VariableWidthVector mutators, and added simple unit tests demonstrating mutator behavior. ---