[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1082#discussion_r166196043 --- Diff: distribution/src/resources/auto-setup.sh --- @@ -0,0 +1,222 @@ +#!/usr/bin/env bash +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This file is invoked by drill-config.sh during a Drillbit startup and provides +# default checks and autoconfiguration. +# Distributions should not put anything in this file. Checks can be +# specified in ${DRILL_HOME}/conf/distrib-setup.sh +# Users should not put anything in this file. Additional checks can be defined +# and put in ${DRILL_CONF_DIR}/drill-setup.sh instead. +# To FAIL any check, return with a non-zero return code +# e.g. +# if [ $status == "FAILED" ]; return 1; fi + +###== +# FEATURES +# 1. Provides checks and auto-configuration for memory settings +###== + +# Convert Java memory value to MB +function valueInMB() { + if [ -z "$1" ]; then echo ""; return; fi + local inputTxt=`echo $1| tr '[A-Z]' '[a-z]'` + local inputValue=`echo ${inputTxt:0:${#inputTxt}-1}`; + # Extracting Numeric Value + if [[ "$inputTxt" == *g ]]; then +let valueInMB=$inputValue*1024 + elif [[ "$DbitMaxProcMem" == *k ]]; then +let valueInMB=$inputValue/1024 + elif [[ "$inputTxt" == *m ]]; then +let valueInMB=$inputValue + elif [[ "$inputTxt" == *% ]]; then +#TotalRAM_inMB*percentage [Works on Linux] +let valueInMB=$inputValue*$totalRAM_inMB/100; + else +echo error; +return 1; + fi + echo "$valueInMB" + return +} + +# Convert Java memory value to GB +function valueInGB() { + if [ -z "$1" ]; then echo ""; return; fi + local inputTxt=`echo $1| tr '[A-Z]' '[a-z]'` + local inputValue=`echo ${inputTxt:0:${#inputTxt}-1}`; + # Extracting Numeric Value + if [[ "$inputTxt" == *g ]]; then +let valueInGB=$inputValue + elif [[ "$DbitMaxProcMem" == *k ]]; then +let valueInGB=$inputValue/1024/1024 + elif [[ "$inputTxt" == *m ]]; then +let valueInGB=$inputValue/1024 + elif [[ "$inputTxt" == *% ]]; then +#TotalRAM_inMB*percentage [Works on Linux] +let valueInGB=$inputValue*`cat /proc/meminfo | grep MemTotal | tr ' ' '\n'| grep '[0-9]'`/1024/1024/100; + else +echo error; +return 1; + fi + echo "$valueInGB" + return +} + +# Estimates code cache based on total heap and direct +function estCodeCacheInMB() { + local totalHeapAndDirect=$1 + if [ $totalHeapAndDirect -le 4096 ]; then echo 512; + elif [ $totalHeapAndDirect -le 10240 ]; then echo 768; + else echo 1024; + fi +} + +#Print Current Allocation +function printCurrAllocation() +{ + if [ -n "$DRILLBIT_MAX_PROC_MEM" ]; then echo -e "\tDRILLBIT_MAX_PROC_MEM=$DRILLBIT_MAX_PROC_MEM"; fi + if [ -n "$DRILL_HEAP" ]; then echo -e "\tDRILL_HEAP=$DRILL_HEAP"; fi + if [ -n "$DRILL_MAX_DIRECT_MEMORY" ]; then echo -e "\tDRILL_MAX_DIRECT_MEMORY=$DRILL_MAX_DIRECT_MEMORY"; fi + if [ -n "$DRILLBIT_CODE_CACHE_SIZE" ]; then +echo -e "\tDRILLBIT_CODE_CACHE_SIZE=$DRILLBIT_CODE_CACHE_SIZE " +echo -e "\t*NOTE: It is recommended not to specify DRILLBIT_CODE_CACHE_SIZE as this will be auto-computed based on the HeapSize and would not exceed 1GB" + fi +} + +# +# Check and auto-configuration for memory settings +# +#Default (Track status of this check: "" => Continue checking ; "PASSED" => no more check required) +AutoMemConfigStatus="" + +#Computing existing system information +# Tested on Linux (CentOS/RHEL/Ubuntu); Cygwin (Win10Pro-64bit) +if [[ "$OSTYPE" == *linux* ]] || [[
[jira] [Created] (DRILL-6139) Travis CI hangs on TestVariableWidthWriter#testRestartRow
Boaz Ben-Zvi created DRILL-6139: --- Summary: Travis CI hangs on TestVariableWidthWriter#testRestartRow Key: DRILL-6139 URL: https://issues.apache.org/jira/browse/DRILL-6139 Project: Apache Drill Issue Type: Bug Affects Versions: 1.12.0 Reporter: Boaz Ben-Zvi The Travis CI fails (probably hangs, then times out) in the following test: {code:java} Running org.apache.drill.test.rowSet.test.DummyWriterTest Running org.apache.drill.test.rowSet.test.DummyWriterTest#testDummyScalar Running org.apache.drill.test.rowSet.test.DummyWriterTest#testDummyMap Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.109 sec - in org.apache.drill.test.rowSet.test.DummyWriterTest Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testSkipNulls Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testWrite Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testFillEmpties Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRollover Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testSizeLimit Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRolloverWithEmpties Running org.apache.drill.test.rowSet.test.TestVariableWidthWriter#testRestartRow Killed Results : Tests run: 1554, Failures: 0, Errors: 0, Skipped: 66{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] drill issue #1113: DRILL-5902: Regression: Queries encounter random failure ...
Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1113 @arina-ielchiieva Please review ---
[GitHub] drill pull request #1113: DRILL-5902: Regression: Queries encounter random f...
GitHub user vrozov opened a pull request: https://github.com/apache/drill/pull/1113 DRILL-5902: Regression: Queries encounter random failure due to RPC connection timed out You can merge this pull request into a Git repository by running: $ git pull https://github.com/vrozov/drill DRILL-5902 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1113.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1113 commit fe329c2517710bb9fdec273de24321717fc954e6 Author: Vlad RozovDate: 2018-02-06T03:15:56Z DRILL-5902: Regression: Queries encounter random failure due to RPC connection timed out ---
[jira] [Created] (DRILL-6138) Move RecordBatchSizer to org.apache.drill.exec.record package
Padma Penumarthy created DRILL-6138: --- Summary: Move RecordBatchSizer to org.apache.drill.exec.record package Key: DRILL-6138 URL: https://issues.apache.org/jira/browse/DRILL-6138 Project: Apache Drill Issue Type: Task Components: Execution - Flow Affects Versions: 1.12.0 Reporter: Padma Penumarthy Assignee: Padma Penumarthy Fix For: 1.13.0 Move RecordBatchSizer from org.apache.drill.exec.physical.impl.spill package to org.apache.drill.exec.record package. Minor refactoring - change columnSizes from list to map. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] drill pull request #1112: DRILL-6114: Metadata revisions
GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/1112 DRILL-6114: Metadata revisions This PR is part of the "[batch handling updates\https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades]; project. It completes the new internal metadata system by including support for the remaining vector types: unions, lists and repeated lists. The metadata code was refactored. Previously, it was small enough to fit into a single file (with nested classes). With the added complexity, the metadata classes were split out into separate classes, grouped into its own Java package. A few fixes were made here and there to ensure the unit tests pass. @ppadma or @bitblender, can one of you run the pre-commit tests and send me the details of any failures? You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-6114B Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1112.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1112 commit e0f11f923ea48070f829a859395ee66b81b9ba18 Author: Paul RogersDate: 2018-02-06T04:18:18Z DRILL-6114: Metadata revisions Support for union vectors, list vectors, repeated list vectors. Refactored metadata classes. ---
Re: Apache Drill with Azure Data Lake Store
Hi Kamal, My understanding was that the file system running on top of Azure data store was still HDFS? Is that true? If that be the case, the DFS plugin should work. It is worth a test. Thanks, Saurabh Sent from my iPhone > On Feb 3, 2018, at 6:02 PM, Kamal Baigwrote: > > Hi > > I am looking for some help around connecting and processing data stored in > Azure Data lake store (Not the Azure Blob) > > using Apache Drill > > Any help and suggestion would be highly appreciated. I am a beginner with > Apache Drill so any docs or steps would be great to get started > > Thanks
Apache Drill with Azure Data Lake Store
Hi I am looking for some help around connecting and processing data stored in Azure Data lake store (Not the Azure Blob) using Apache Drill Any help and suggestion would be highly appreciated. I am a beginner with Apache Drill so any docs or steps would be great to get started Thanks
[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1082#discussion_r166158426 --- Diff: distribution/src/resources/drill-config.sh --- @@ -180,18 +251,61 @@ else fi fi -# Default memory settings if none provided by the environment or +# Execute distrib-setup.sh for any distribution-specific setup (e.g. checks). +# distrib-setup.sh is optional; it is created by some distribution installers +# that need additional distribution-specific setup to be done. +# Because installers will have site-specific steps, the file +# should be moved into the site directory, if the user employs one. + +# Checking if being executed in context of Drillbit and not SQLLine +if [ "$DRILLBIT_CONTEXT" == "1" ]; then + # Check whether to run exclusively distrib-setup.sh OR auto-setup.sh + distribSetup="$DRILL_CONF_DIR/distrib-setup.sh" ; #Site-based distrib-setup.sh + if [ $(checkExecutableLineCount $distribSetup) -eq 0 ]; then +distribSetup="$DRILL_HOME/conf/distrib-setup.sh" ; #Install-based distrib-setup.sh +if [ $(checkExecutableLineCount $distribSetup) -eq 0 ]; then + # Run Default Auto Setup + distribSetup="$DRILL_HOME/bin/auto-setup.sh" +fi + fi + # Check and run additional setup defined by user + drillSetup="$DRILL_CONF_DIR/drill-setup.sh" ; #Site-based drill-setup.sh + if [ $(checkExecutableLineCount $drillSetup) -eq 0 ]; then +drillSetup="$DRILL_HOME/conf/drill-setup.sh" ; #Install-based drill-setup.sh +if [ $(checkExecutableLineCount $drillSetup) -eq 0 ]; then drillSetup=""; fi + fi + + # Enforcing checks in order (distrib-setup.sh , drill-setup.sh) + # (NOTE: A script is executed only if it has relevant executable lines) + # Both distribSetup & drillSetup are executed because the user might have introduced additional checks + if [ -n "$distribSetup" ]; then +. "$distribSetup" +if [ $? -gt 0 ]; then fatal_error "Aborting Drill Startup due failed setup by $distribSetup"; fi --- End diff -- The auto-configuration scripts do indeed do that. However, I thought it would be good to have a higher level error message also indicating the source of the failure. This allows us to catch any non-zero exit codes that might be thrown and not handled cleanly. Other sections of `drill-config.sh` followed this principle. ---
[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1082#discussion_r166157864 --- Diff: distribution/src/resources/drill-config.sh --- @@ -180,18 +251,61 @@ else fi fi -# Default memory settings if none provided by the environment or +# Execute distrib-setup.sh for any distribution-specific setup (e.g. checks). +# distrib-setup.sh is optional; it is created by some distribution installers +# that need additional distribution-specific setup to be done. +# Because installers will have site-specific steps, the file +# should be moved into the site directory, if the user employs one. + +# Checking if being executed in context of Drillbit and not SQLLine +if [ "$DRILLBIT_CONTEXT" == "1" ]; then + # Check whether to run exclusively distrib-setup.sh OR auto-setup.sh + distribSetup="$DRILL_CONF_DIR/distrib-setup.sh" ; #Site-based distrib-setup.sh + if [ $(checkExecutableLineCount $distribSetup) -eq 0 ]; then --- End diff -- I'd have liked the KISS principle, but I thought there was a need for placeholder `distrib-setup.sh` file. Based on that, I need to figure out whether there is a distribtion-specific setup, or should we revert to executing the `auto-setup.sh`. Unlike sourcing environment files, where an unset variable can be set, for auto-setup, the choice of execution has to be mutually exclusive. This block looks complicated (and verbose with the comments), but is only identifying *what* setup script needs to execute. Hence, all we do here is an assignment of the variables. ---
[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1082#discussion_r166157561 --- Diff: distribution/src/resources/auto-setup.sh --- @@ -0,0 +1,222 @@ +#!/usr/bin/env bash +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This file is invoked by drill-config.sh during a Drillbit startup and provides +# default checks and autoconfiguration. +# Distributions should not put anything in this file. Checks can be +# specified in ${DRILL_HOME}/conf/distrib-setup.sh +# Users should not put anything in this file. Additional checks can be defined +# and put in ${DRILL_CONF_DIR}/drill-setup.sh instead. +# To FAIL any check, return with a non-zero return code +# e.g. +# if [ $status == "FAILED" ]; return 1; fi + +###== +# FEATURES +# 1. Provides checks and auto-configuration for memory settings +###== + +# Convert Java memory value to MB +function valueInMB() { + if [ -z "$1" ]; then echo ""; return; fi + local inputTxt=`echo $1| tr '[A-Z]' '[a-z]'` + local inputValue=`echo ${inputTxt:0:${#inputTxt}-1}`; + # Extracting Numeric Value + if [[ "$inputTxt" == *g ]]; then +let valueInMB=$inputValue*1024 + elif [[ "$DbitMaxProcMem" == *k ]]; then +let valueInMB=$inputValue/1024 + elif [[ "$inputTxt" == *m ]]; then +let valueInMB=$inputValue + elif [[ "$inputTxt" == *% ]]; then +#TotalRAM_inMB*percentage [Works on Linux] +let valueInMB=$inputValue*$totalRAM_inMB/100; + else +echo error; +return 1; + fi + echo "$valueInMB" + return +} + +# Convert Java memory value to GB +function valueInGB() { + if [ -z "$1" ]; then echo ""; return; fi + local inputTxt=`echo $1| tr '[A-Z]' '[a-z]'` + local inputValue=`echo ${inputTxt:0:${#inputTxt}-1}`; + # Extracting Numeric Value + if [[ "$inputTxt" == *g ]]; then +let valueInGB=$inputValue + elif [[ "$DbitMaxProcMem" == *k ]]; then +let valueInGB=$inputValue/1024/1024 + elif [[ "$inputTxt" == *m ]]; then +let valueInGB=$inputValue/1024 + elif [[ "$inputTxt" == *% ]]; then +#TotalRAM_inMB*percentage [Works on Linux] +let valueInGB=$inputValue*`cat /proc/meminfo | grep MemTotal | tr ' ' '\n'| grep '[0-9]'`/1024/1024/100; + else +echo error; +return 1; + fi + echo "$valueInGB" + return +} + +# Estimates code cache based on total heap and direct +function estCodeCacheInMB() { + local totalHeapAndDirect=$1 + if [ $totalHeapAndDirect -le 4096 ]; then echo 512; + elif [ $totalHeapAndDirect -le 10240 ]; then echo 768; + else echo 1024; + fi +} + +#Print Current Allocation +function printCurrAllocation() +{ + if [ -n "$DRILLBIT_MAX_PROC_MEM" ]; then echo -e "\tDRILLBIT_MAX_PROC_MEM=$DRILLBIT_MAX_PROC_MEM"; fi + if [ -n "$DRILL_HEAP" ]; then echo -e "\tDRILL_HEAP=$DRILL_HEAP"; fi + if [ -n "$DRILL_MAX_DIRECT_MEMORY" ]; then echo -e "\tDRILL_MAX_DIRECT_MEMORY=$DRILL_MAX_DIRECT_MEMORY"; fi + if [ -n "$DRILLBIT_CODE_CACHE_SIZE" ]; then +echo -e "\tDRILLBIT_CODE_CACHE_SIZE=$DRILLBIT_CODE_CACHE_SIZE " +echo -e "\t*NOTE: It is recommended not to specify DRILLBIT_CODE_CACHE_SIZE as this will be auto-computed based on the HeapSize and would not exceed 1GB" + fi +} + +# +# Check and auto-configuration for memory settings +# +#Default (Track status of this check: "" => Continue checking ; "PASSED" => no more check required) +AutoMemConfigStatus="" + +#Computing existing system information +# Tested on Linux (CentOS/RHEL/Ubuntu); Cygwin (Win10Pro-64bit) +if [[ "$OSTYPE" == *linux* ]] || [[
[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1082#discussion_r166157369 --- Diff: distribution/src/assemble/bin.xml --- @@ -345,6 +345,21 @@ 0755 conf + + src/resources/auto-setup.sh + 0755 + bin + + + src/resources/drill-setup.sh + 0755 + conf + + + src/resources/distrib-setup.sh --- End diff -- The `distrib-setup.sh` file is empty, but provided the placeholder to indicate where distributions should make the change. This is identical to the intent of having `distrib-env.sh` in the Apache distribution, which is also empty but serves the same purpose. https://github.com/apache/drill/blob/master/distribution/src/resources/distrib-env.sh Just following the same convention. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user sachouche commented on the issue: https://github.com/apache/drill/pull/1011 +1 I have reviewed the code and overall looks good. My main feedback is that the current implementation doesn't currently support secure clusters (at least didn't see any logic associated with that). Yarn applications have issues staying up for a long time because of ticket renewal limitations. We might want to create another enhancement JIRA to support such use-cases. ---
[GitHub] drill issue #1107: DRILL-6123: Limit batch size for Merge Join based on memo...
Github user ppadma commented on the issue: https://github.com/apache/drill/pull/1107 @sachouche @ilooner @paul-rogers Can one of you review this PR for me ? ---
[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...
Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1101#discussion_r166096630 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -215,6 +206,7 @@ public BatchHolder() { MaterializedField outputField = materializedValueFields[i]; // Create a type-specific ValueVector for this value vector = TypeHelper.getNewVector(outputField, allocator); + int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize; --- End diff -- there is already stdSize which is kind of doing the same thing. can we use that instead of knownSize ? ---
[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...
Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1101#discussion_r166142178 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java --- @@ -65,6 +70,14 @@ public int stdSize; +/** + * If the we can determine the exact width of the row of a vector upfront, + * the row widths is saved here. If we cannot determine the exact width + * (for example for VarChar or Repeated vectors), then + */ + +private int knownSize = -1; --- End diff -- Like I mentioned in other comment, seems like we can just use stdSize. ---
[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...
Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1101#discussion_r166098364 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -140,6 +131,9 @@ private OperatorContext oContext; private BufferAllocator allocator; + private MapkeySizes; + // The size estimates for varchar value columns. The keys are the index of the varchar value columns. + private Map varcharValueSizes; --- End diff -- Don't you need to adjust size estimates for repeated types also ? ---
[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...
Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1101#discussion_r166141274 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -733,28 +780,32 @@ private void restoreReservedMemory() { * @param records */ private void allocateOutgoing(int records) { -// Skip the keys and only allocate for outputting the workspace values -// (keys will be output through splitAndTransfer) -IteratoroutgoingIter = outContainer.iterator(); -for (int i = 0; i < numGroupByOutFields; i++) { - outgoingIter.next(); -} - // try to preempt an OOM by using the reserved memory useReservedOutgoingMemory(); long allocatedBefore = allocator.getAllocatedMemory(); -while (outgoingIter.hasNext()) { +for (int columnIndex = numGroupByOutFields; columnIndex < outContainer.getNumberOfColumns(); columnIndex++) { + final VectorWrapper wrapper = outContainer.getValueVector(columnIndex); @SuppressWarnings("resource") - ValueVector vv = outgoingIter.next().getValueVector(); + final ValueVector vv = wrapper.getValueVector(); - AllocationHelper.allocatePrecomputedChildCount(vv, records, maxColumnWidth, 0); + final RecordBatchSizer.ColumnSize columnSizer = new RecordBatchSizer.ColumnSize(wrapper.getValueVector()); + int columnSize; + + if (columnSizer.hasKnownSize()) { +// For fixed width vectors we know the size of each record +columnSize = columnSizer.getKnownSize(); + } else { +// For var chars we need to use the input estimate +columnSize = varcharValueSizes.get(columnIndex); + } + + AllocationHelper.allocatePrecomputedChildCount(vv, records, columnSize, 0); --- End diff -- I think we should also get elementCount from sizer and use that instead of passing 0. ---
[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...
Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1101#discussion_r166142507 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -298,6 +298,11 @@ public int getPayloadByteCount(int valueCount) { return valueCount * ${type.width}; } + @Override + public int getValueWidth() { --- End diff -- If we are using stdSize and get the value from TypeHelper, we don't need all value vectors to have this new function. ---
[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...
Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1101#discussion_r166136279 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -226,7 +221,7 @@ public BatchHolder() { ((FixedWidthVector) vector).allocateNew(HashTable.BATCH_SIZE); } else if (vector instanceof VariableWidthVector) { // This case is never used a varchar falls under ObjectVector which is allocated on the heap ! -((VariableWidthVector) vector).allocateNew(maxColumnWidth, HashTable.BATCH_SIZE); +((VariableWidthVector) vector).allocateNew(columnSize, HashTable.BATCH_SIZE); --- End diff -- for a just allocated vector, estSize will return 0. how can we use that for allocation ? ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1011 @sachouche @vrozov @arina-ielchiieva please review ---
[jira] [Created] (DRILL-6137) Join Failure When Some Json File Partitions Empty
Timothy Farkas created DRILL-6137: - Summary: Join Failure When Some Json File Partitions Empty Key: DRILL-6137 URL: https://issues.apache.org/jira/browse/DRILL-6137 Project: Apache Drill Issue Type: Bug Reporter: Timothy Farkas Assignee: Timothy Farkas The following exception can occurr when the following query is executed: {code} select t.p_partkey, t1.ps_suppkey from dfs.`join/empty_part/part` as t RIGHT JOIN dfs.`join/empty_part/partsupp` as t1 ON t.p_partkey = t1.ps_partkey where t1.ps_partkey > 1 {code} * part has one nonempty file 0_0_0.json * partsupp has one nonempty file 0_0_0.json and one empty file 0_0_1.json {code} (java.lang.IllegalStateException) next() [on #10, RemovingRecordBatch] called again after it returned NONE. Caller should not have called next() again. org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():220 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.test.generated.HashJoinProbeGen2.executeProbePhase():119 org.apache.drill.exec.test.generated.HashJoinProbeGen2.probeAndProject():227 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():222 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():228 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():228 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134 org.apache.drill.exec.record.AbstractRecordBatch.next():164 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():228 org.apache.drill.exec.physical.impl.BaseRootExec.next():105 org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93 org.apache.drill.exec.physical.impl.BaseRootExec.next():95 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():233 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():415 org.apache.hadoop.security.UserGroupInformation.doAs():1657 org.apache.drill.exec.work.fragment.FragmentExecutor.run():226 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1145 java.util.concurrent.ThreadPoolExecutor$Worker.run():615 java.lang.Thread.run():745 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Google Hangouts: Lateral Join High Level Design Presentation
Hi All, Aman and Sorabh will be talking about the high level design of lateral join in the next hangout session tomorrow. Since lateral join is a big topic they'll talk about more of the details of the design after Parth comes back in another hangout session. Thanks, Tim
[GitHub] drill issue #1111: Upgrade drill-hive libraries to 2.1.1 version.
Github user priteshm commented on the issue: https://github.com/apache/drill/pull/ @vrozov can you please review this change? ---
RE: PCAP files with Apache Drill and Sergeant R
I don’t think you can (or even want to) directly access them, assuming that the HTTP link you shared is your intended way of accessing the data. Bringing them into Amazon S3 will make it easier to spin up Drill and access the data, and you could even use the 'tmp' workspace or create temporary tables within a Drill session to work on the data without having to repeatedly pull in the raw data from S3. -Original Message- From: Houssem Hosni [mailto:houssem.ho...@lip6.fr] Sent: Monday, February 05, 2018 9:44 AM To: dev@drill.apache.org Subject: PCAP files with Apache Drill and Sergeant R Hi, I am sending this mail with a hope to get some help from you. I am working on making some analysis and prediction models on large pcap files. Can Apache Drill with R Sergeant library help me in this context. Actually the pcap files are so large (MAWI) and they are available on the web(https://urldefense.proofpoint.com/v2/url?u=http-3A__mawi.wide.ad.jp_mawi_samplepoint-2DF_2018_=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=-cT6otg6lpT_XkmYy7yg3A=ph9AC7KBFF30DWucRa-rMCB36AlwdjoovGNbm5YOzDk=Q-snTp608TWJGp5jKX5QCGEkQYOQMLem3NOc3khl0xE=). I want to access them via apache Drill and then make some analysis using Sergeant package (R) that works well with Drill. Should I bring those large MAWI pcap files on the web to Amazon S3 and then access them with DRILL or is it possible to access them directly without amazon storage ? What steps should I start with ? Special THANKS in advance for considering my request. Best regards, Houssem Hosni LIP6 - Sorbonne University houssem.ho...@lip6.fr Place Jussieu, 75005 Paris. Tel: (+0033)0644087200
[GitHub] drill pull request #1104: DRILL-6118: Handle item star columns during projec...
Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1104#discussion_r166066830 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java --- @@ -596,10 +596,10 @@ private void classifyExpr(final NamedExpression ex, final RecordBatch incoming, final NameSegment ref = ex.getRef().getRootSegment(); final boolean exprHasPrefix = expr.getPath().contains(StarColumnHelper.PREFIX_DELIMITER); final boolean refHasPrefix = ref.getPath().contains(StarColumnHelper.PREFIX_DELIMITER); -final boolean exprIsStar = expr.getPath().equals(SchemaPath.WILDCARD); -final boolean refContainsStar = ref.getPath().contains(SchemaPath.WILDCARD); -final boolean exprContainsStar = expr.getPath().contains(SchemaPath.WILDCARD); -final boolean refEndsWithStar = ref.getPath().endsWith(SchemaPath.WILDCARD); +final boolean exprIsStar = expr.getPath().equals(SchemaPath.DYNAMIC_STAR); --- End diff -- Why don't we need to handle WILDCARD case anymore? ---
[GitHub] drill pull request #1104: DRILL-6118: Handle item star columns during projec...
Github user chunhui-shi commented on a diff in the pull request: https://github.com/apache/drill/pull/1104#discussion_r166094020 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java --- @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.logical; + +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableSet; +import org.apache.calcite.adapter.enumerable.EnumerableTableScan; +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.plan.RelOptRuleOperand; +import org.apache.calcite.plan.RelOptTable; +import org.apache.calcite.prepare.RelOptTableImpl; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.CorrelationId; +import org.apache.calcite.rel.core.Filter; +import org.apache.calcite.rel.core.Project; +import org.apache.calcite.rel.core.TableScan; +import org.apache.calcite.rel.logical.LogicalFilter; +import org.apache.calcite.rel.logical.LogicalProject; +import org.apache.calcite.rel.type.RelDataType; +import org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.calcite.rel.type.RelDataTypeField; +import org.apache.calcite.rex.RexCall; +import org.apache.calcite.rex.RexInputRef; +import org.apache.calcite.rex.RexNode; +import org.apache.calcite.rex.RexVisitorImpl; +import org.apache.calcite.schema.Table; +import org.apache.drill.exec.planner.types.RelDataTypeDrillImpl; +import org.apache.drill.exec.planner.types.RelDataTypeHolder; +import org.apache.drill.exec.util.Utilities; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.apache.drill.exec.planner.logical.FieldsReWriterUtil.DesiredField; +import static org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter; + +/** + * Rule will transform filter -> project -> scan call with item star fields in filter + * into project -> filter -> project -> scan where item star fields are pushed into scan + * and replaced with actual field references. + * + * This will help partition pruning and push down rules to detect fields that can be pruned or push downed. + * Item star operator appears when sub-select or cte with star are used as source. + */ +public class DrillFilterItemStarReWriterRule extends RelOptRule { + + public static final DrillFilterItemStarReWriterRule INSTANCE = new DrillFilterItemStarReWriterRule( + RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, RelOptHelper.any( TableScan.class))), + "DrillFilterItemStarReWriterRule"); + + private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, String id) { +super(operand, id); + } + + @Override + public void onMatch(RelOptRuleCall call) { +Filter filterRel = call.rel(0); +Project projectRel = call.rel(1); +TableScan scanRel = call.rel(2); + +ItemStarFieldsVisitor itemStarFieldsVisitor = new ItemStarFieldsVisitor(filterRel.getRowType().getFieldNames()); --- End diff -- Other test cases should be covered are: nested field names, refer to two different fields under the same parent, eg. a.b and a.c. and array type referred in filters and projects. ---
[GitHub] drill pull request #1111: Upgrade drill-hive libraries to 2.1.1 version.
GitHub user vdiravka opened a pull request: https://github.com/apache/drill/pull/ Upgrade drill-hive libraries to 2.1.1 version. Updating hive properties for tests and resolving dependencies and API conflicts: * Allowing of using Hive's own calcite-core and avatica versions by hive-exec. Calcite version is removed from root POM Dependency Management * Fix for "hive.metastore.schema.verification", MetaException(message: Version information not found in metastore) https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool METASTORE_SCHEMA_VERIFICATION="false" property is added * Fix JSONException class is not found (excluded banned org.json dependency) * Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional tables are necessary in Hive metastore * Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property * jackson and parquet libraries are relocated in hive-exec-shade module * org.apache.parquet:parquet-column Drill version is added to "hive-exec" to allow of using Parquet empty group on MessageType level (PARQUET-278) * Removing of commons-codec exclusion from hive core. This dependency is necessary for hive-exec and hive-metastore. * Setting Hive's internal properties for transactional scan: HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION, IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES You can merge this pull request into a Git repository by running: $ git pull https://github.com/vdiravka/drill DRILL-5978 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes # commit 476c44cce7a38ff818c5e328e3db61773bab18a3 Author: Vitalii DiravkaDate: 2017-11-13T16:04:03Z Upgrade drill-hive libraries to 2.1.1 version. Updating hive properties for tests and resolving dependencies and API conflicts: * Allowing of using Hive's own calcite-core and avatica versions by hive-exec. Calcite version is removed from root POM Dependency Management * Fix for "hive.metastore.schema.verification", MetaException(message: Version information not found in metastore) https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool METASTORE_SCHEMA_VERIFICATION="false" property is added * Fix JSONException class is not found (excluded banned org.json dependency) * Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional tables are necessary in Hive metastore * Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property * jackson and parquet libraries are relocated in hive-exec-shade module * org.apache.parquet:parquet-column Drill version is added to "hive-exec" to allow of using Parquet empty group on MessageType level (PARQUET-278) * Removing of commons-codec exclusion from hive core. This dependency is necessary for hive-exec and hive-metastore. * Setting Hive's internal properties for transactional scan: HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION, IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES ---
PCAP files with Apache Drill and Sergeant R
Hi, I am sending this mail with a hope to get some help from you. I am working on making some analysis and prediction models on large pcap files. Can Apache Drill with R Sergeant library help me in this context. Actually the pcap files are so large (MAWI) and they are available on the web(http://mawi.wide.ad.jp/mawi/samplepoint-F/2018/). I want to access them via apache Drill and then make some analysis using Sergeant package (R) that works well with Drill. Should I bring those large MAWI pcap files on the web to Amazon S3 and then access them with DRILL or is it possible to access them directly without amazon storage ? What steps should I start with ? Special THANKS in advance for considering my request. Best regards, Houssem Hosni LIP6 - Sorbonne University houssem.ho...@lip6.fr Place Jussieu, 75005 Paris. Tel: (+0033)0644087200
[jira] [Created] (DRILL-6136) drill-jdbc-all jar missing dependencies
Craig Foote created DRILL-6136: -- Summary: drill-jdbc-all jar missing dependencies Key: DRILL-6136 URL: https://issues.apache.org/jira/browse/DRILL-6136 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Affects Versions: 1.12.0 Reporter: Craig Foote Using drill-jdbc-all-1.12.0,jar with logstash (elasticsearch ingester) returns NoClassDefFoundError for oadd.org.apache.drill.exec.store.StoragePluginRegistry. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6135) New Feature: SHOW CREATE VIEW command
Hari Sekhon created DRILL-6135: -- Summary: New Feature: SHOW CREATE VIEW command Key: DRILL-6135 URL: https://issues.apache.org/jira/browse/DRILL-6135 Project: Apache Drill Issue Type: New Feature Components: Metadata, Storage - Information Schema Affects Versions: 1.10.0 Environment: MapR 5.2 + Kerberos Reporter: Hari Sekhon Feature Request to implement {code:java} SHOW CREATE VIEW ;{code} A colleague and I just had to cat the view file which is non-pretty json and hard to read a large view creation statement that could have been presented in drill shell and formatted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6134) Many Drill queries fail when using JDBC Driver from Simba
Robert Hou created DRILL-6134: - Summary: Many Drill queries fail when using JDBC Driver from Simba Key: DRILL-6134 URL: https://issues.apache.org/jira/browse/DRILL-6134 Project: Apache Drill Issue Type: Bug Reporter: Robert Hou Assignee: Pritesh Maker Here is an example: Query: /root/drillAutomation/framework-master/framework/resources/Functional/limit0/union/data/union_51.q {noformat} (SELECT c2 FROM `union_01_v` ORDER BY c5 DESC nulls first) UNION (SELECT c2 FROM `union_02_v` ORDER BY c5 ASC nulls first){noformat} This is the error: {noformat} Exception: java.sql.SQLException: [JDBC Driver]The field c2(BIGINT:OPTIONAL) [$bits$(UINT1:REQUIRED), $values$(BIGINT:OPTIONAL)] doesn't match the provided metadata major_type { minor_type: BIGINT mode: OPTIONAL } name_part { name: "$values$" } value_count: 18 buffer_length: 144 . at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145) at org.apache.drill.exec.vector.BigIntVector.load(BigIntVector.java:287) at org.apache.drill.exec.vector.NullableBigIntVector.load(NullableBigIntVector.java:274) at org.apache.drill.exec.record.RecordBatchLoader.load(RecordBatchLoader.java:131) at com.mapr.drill.drill.dataengine.DRJDBCResultSet.doLoadRecordBatchData(Unknown Source) at com.mapr.drill.drill.dataengine.DRJDBCResultSet.hasMoreRows(Unknown Source) at com.mapr.drill.drill.dataengine.DRJDBCResultSet.doMoveToNextRow(Unknown Source) at com.mapr.drill.jdbc.common.CommonResultSet.moveToNextRow(Unknown Source) at com.mapr.drill.jdbc.common.SForwardResultSet.next(Unknown Source) at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:255) at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: The field c2(BIGINT:OPTIONAL) [$bits$(UINT1:REQUIRED), $values$(BIGINT:OPTIONAL)] doesn't match the provided metadata major_type { minor_type: BIGINT mode: OPTIONAL } name_part { name: "$values$" } value_count: 18 buffer_length: 144 . ... 16 more{noformat} The commit that causes these errors to occur is: {noformat} https://issues.apache.org/jira/browse/DRILL-6049 Rollup of hygiene changes from "batch size" project commit ID e791ed62b1c91c39676c4adef438c689fd84fd4b{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)