[jira] [Created] (DRILL-4832) Metadata Cache Pruning is not taking place in some rare scenarios
Rahul Challapalli created DRILL-4832: Summary: Metadata Cache Pruning is not taking place in some rare scenarios Key: DRILL-4832 URL: https://issues.apache.org/jira/browse/DRILL-4832 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Reporter: Rahul Challapalli git.commit.id.abbrev=f476eb5 The below set of queries is one test case. I usually run 10 instances of the same test case in parallel and we read the cache file from the appropriate sub-directory in almost all the cases. But only once out of 25 such runs, I observed that we are reading the cache from the root level instead of the sub-directory for the "l_3level" table. {code} refresh table metadata l_3level; refresh table metadata c_1level; refresh table metadata o_2level; explain plan for select l.l_orderkey, sum(l.l_extendedprice * (1 - l.l_discount)) as revenue, o.o_orderdate, o.o_shippriority from c_1level c, o_2level o, l_3level l where c.c_mktsegment = 'HOUSEHOLD' and c.c_custkey = o.o_custkey and l.l_orderkey = o.o_orderkey and l.dir0 = 1 and l.dir1 = 'three' and l.dir2 = '2015-7-12' and o.dir0 = '1991' and o.dir1 = 'feb' and o.o_orderdate < date '1995-03-25' and l.l_shipdate > date '1995-03-25' group by l.l_orderkey, o.o_orderdate, o.o_shippriority 00-00Screen 00-01 Project(l_orderkey=[$0], o_orderdate=[$1], o_shippriority=[$2]) 00-02Project(l_orderkey=[$8], o_orderdate=[$6], o_shippriority=[$7]) 00-03 HashJoin(condition=[=($1, $2)], joinType=[inner]) 00-05SelectionVectorRemover 00-08 Filter(condition=[=($0, 'HOUSEHOLD')]) 00-11Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/metadata_caching_pp/c_1level/1991/customer.parquet], ReadEntryWithPath [path=/drill/testdata/metadata_caching_pp/c_1level/1992/customer.parquet]], selectionRoot=/drill/testdata/metadata_caching_pp/c_1level, numFiles=2, usedMetadataFile=true, cacheFileRoot=/drill/testdata/metadata_caching_pp/c_1level, columns=[`c_mktsegment`, `c_custkey`]]]) 00-04HashJoin(condition=[=($6, $1)], joinType=[inner]) 00-07 SelectionVectorRemover 00-10Filter(condition=[AND(=($2, '1991'), =($3, 'feb'), <($4, 1995-03-25))]) 00-13 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/metadata_caching_pp/o_2level/1991/feb/orders.parquet]], selectionRoot=/drill/testdata/metadata_caching_pp/o_2level, numFiles=1, usedMetadataFile=true, cacheFileRoot=/drill/testdata/metadata_caching_pp/o_2level/1991/feb, columns=[`o_custkey`, `o_orderkey`, `dir0`, `dir1`, `o_orderdate`, `o_shippriority`]]]) 00-06 Project(l_orderkey=[$0], dir00=[$1], dir10=[$2], dir2=[$3], l_shipdate=[$4]) 00-09SelectionVectorRemover 00-12 Filter(condition=[AND(=($1, 1), =($2, 'three'), =($3, '2015-7-12'), >($4, 1995-03-25))]) 00-14Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/metadata_caching_pp/l_3level/1/three/2015-7-12/40.parquet]], selectionRoot=/drill/testdata/metadata_caching_pp/l_3level, numFiles=1, usedMetadataFile=true, cacheFileRoot=/drill/testdata/metadata_caching_pp/l_3level, columns=[`l_orderkey`, `dir0`, `dir1`, `dir2`, `l_shipdate`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill issue #527: DRILL-4728: Add support for new metadata fetch APIs
Github user sudheeshkatkam commented on the issue: https://github.com/apache/drill/pull/527 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73773119 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/metadata/MetadataProvider.java --- @@ -0,0 +1,451 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.work.metadata; + +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.CATS_COL_CATALOG_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SCHS_COL_SCHEMA_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SHRD_COL_TABLE_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SHRD_COL_TABLE_SCHEMA; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.CATALOGS; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.COLUMNS; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.SCHEMATA; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.TABLES; + +import java.util.UUID; + +import org.apache.calcite.schema.SchemaPlus; +import org.apache.drill.common.exceptions.ErrorHelper; +import org.apache.drill.exec.ops.ViewExpansionContext; +import org.apache.drill.exec.proto.UserBitShared.DrillPBError; +import org.apache.drill.exec.proto.UserBitShared.DrillPBError.ErrorType; +import org.apache.drill.exec.proto.UserProtos.CatalogMetadata; +import org.apache.drill.exec.proto.UserProtos.ColumnMetadata; +import org.apache.drill.exec.proto.UserProtos.GetCatalogsResp; +import org.apache.drill.exec.proto.UserProtos.GetCatalogsReq; +import org.apache.drill.exec.proto.UserProtos.GetColumnsReq; +import org.apache.drill.exec.proto.UserProtos.GetColumnsResp; +import org.apache.drill.exec.proto.UserProtos.GetSchemasReq; +import org.apache.drill.exec.proto.UserProtos.GetSchemasResp; +import org.apache.drill.exec.proto.UserProtos.GetTablesReq; +import org.apache.drill.exec.proto.UserProtos.GetTablesResp; +import org.apache.drill.exec.proto.UserProtos.LikeFilter; +import org.apache.drill.exec.proto.UserProtos.RequestStatus; +import org.apache.drill.exec.proto.UserProtos.RpcType; +import org.apache.drill.exec.proto.UserProtos.SchemaMetadata; +import org.apache.drill.exec.proto.UserProtos.TableMetadata; +import org.apache.drill.exec.rpc.Response; +import org.apache.drill.exec.rpc.ResponseSender; +import org.apache.drill.exec.rpc.user.UserServer.UserClientConnection; +import org.apache.drill.exec.rpc.user.UserSession; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.server.options.OptionValue; +import org.apache.drill.exec.store.SchemaConfig.SchemaConfigInfoProvider; +import org.apache.drill.exec.store.SchemaTreeProvider; +import org.apache.drill.exec.store.ischema.InfoSchemaConstants; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.ConstantExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.ExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.FieldExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.FunctionExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaTableType; +import org.apache.drill.exec.store.ischema.Records.Catalog; +import org.apache.drill.exec.store.ischema.Records.Column; +import org.apache.drill.exec.store.ischema.Records.Schema; +import org.apache.drill.exec.store.ischema.Records.Table; +import org.apache.drill.exec.store.pojo.PojoRecordReader; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; + +/** + * Contains worker {@link Runnable} classes for providing the metadata and related helper methods. + */ +public class MetadataProvider { + private static final org.slf4j.Logger logger =
[GitHub] drill issue #527: DRILL-4728: Add support for new metadata fetch APIs
Github user vkorukanti commented on the issue: https://github.com/apache/drill/pull/527 Rebased the patch and addressed review comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73770112 --- Diff: exec/jdbc-all/pom.xml --- @@ -441,7 +441,7 @@ This is likely due to you adding new dependencies to a java-exec and not updating the excludes in this module. This is important as it minimizes the size of the dependency of Drill application users. - 2000 + 2100 --- End diff -- I ran into this issue as well, while add more functionality to the client. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73769951 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/metadata/MetadataProvider.java --- @@ -0,0 +1,451 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.work.metadata; + +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.CATS_COL_CATALOG_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SCHS_COL_SCHEMA_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SHRD_COL_TABLE_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SHRD_COL_TABLE_SCHEMA; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.CATALOGS; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.COLUMNS; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.SCHEMATA; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.TABLES; + +import java.util.UUID; + +import org.apache.calcite.schema.SchemaPlus; +import org.apache.drill.common.exceptions.ErrorHelper; +import org.apache.drill.exec.ops.ViewExpansionContext; +import org.apache.drill.exec.proto.UserBitShared.DrillPBError; +import org.apache.drill.exec.proto.UserBitShared.DrillPBError.ErrorType; +import org.apache.drill.exec.proto.UserProtos.CatalogMetadata; +import org.apache.drill.exec.proto.UserProtos.ColumnMetadata; +import org.apache.drill.exec.proto.UserProtos.GetCatalogsResp; +import org.apache.drill.exec.proto.UserProtos.GetCatalogsReq; +import org.apache.drill.exec.proto.UserProtos.GetColumnsReq; +import org.apache.drill.exec.proto.UserProtos.GetColumnsResp; +import org.apache.drill.exec.proto.UserProtos.GetSchemasReq; +import org.apache.drill.exec.proto.UserProtos.GetSchemasResp; +import org.apache.drill.exec.proto.UserProtos.GetTablesReq; +import org.apache.drill.exec.proto.UserProtos.GetTablesResp; +import org.apache.drill.exec.proto.UserProtos.LikeFilter; +import org.apache.drill.exec.proto.UserProtos.RequestStatus; +import org.apache.drill.exec.proto.UserProtos.RpcType; +import org.apache.drill.exec.proto.UserProtos.SchemaMetadata; +import org.apache.drill.exec.proto.UserProtos.TableMetadata; +import org.apache.drill.exec.rpc.Response; +import org.apache.drill.exec.rpc.ResponseSender; +import org.apache.drill.exec.rpc.user.UserServer.UserClientConnection; +import org.apache.drill.exec.rpc.user.UserSession; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.server.options.OptionValue; +import org.apache.drill.exec.store.SchemaConfig.SchemaConfigInfoProvider; +import org.apache.drill.exec.store.SchemaTreeProvider; +import org.apache.drill.exec.store.ischema.InfoSchemaConstants; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.ConstantExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.ExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.FieldExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.FunctionExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaTableType; +import org.apache.drill.exec.store.ischema.Records.Catalog; +import org.apache.drill.exec.store.ischema.Records.Column; +import org.apache.drill.exec.store.ischema.Records.Schema; +import org.apache.drill.exec.store.ischema.Records.Table; +import org.apache.drill.exec.store.pojo.PojoRecordReader; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; + +/** + * Contains worker {@link Runnable} classes for providing the metadata and related helper methods. + */ +public class MetadataProvider { + private static final org.slf4j.Logger logger =
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73769751 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/metadata/MetadataProvider.java --- @@ -0,0 +1,451 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.work.metadata; + +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.CATS_COL_CATALOG_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SCHS_COL_SCHEMA_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SHRD_COL_TABLE_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SHRD_COL_TABLE_SCHEMA; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.CATALOGS; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.COLUMNS; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.SCHEMATA; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.TABLES; + +import java.util.UUID; + +import org.apache.calcite.schema.SchemaPlus; +import org.apache.drill.common.exceptions.ErrorHelper; +import org.apache.drill.exec.ops.ViewExpansionContext; +import org.apache.drill.exec.proto.UserBitShared.DrillPBError; +import org.apache.drill.exec.proto.UserBitShared.DrillPBError.ErrorType; +import org.apache.drill.exec.proto.UserProtos.CatalogMetadata; +import org.apache.drill.exec.proto.UserProtos.ColumnMetadata; +import org.apache.drill.exec.proto.UserProtos.GetCatalogsResp; +import org.apache.drill.exec.proto.UserProtos.GetCatalogsReq; +import org.apache.drill.exec.proto.UserProtos.GetColumnsReq; +import org.apache.drill.exec.proto.UserProtos.GetColumnsResp; +import org.apache.drill.exec.proto.UserProtos.GetSchemasReq; +import org.apache.drill.exec.proto.UserProtos.GetSchemasResp; +import org.apache.drill.exec.proto.UserProtos.GetTablesReq; +import org.apache.drill.exec.proto.UserProtos.GetTablesResp; +import org.apache.drill.exec.proto.UserProtos.LikeFilter; +import org.apache.drill.exec.proto.UserProtos.RequestStatus; +import org.apache.drill.exec.proto.UserProtos.RpcType; +import org.apache.drill.exec.proto.UserProtos.SchemaMetadata; +import org.apache.drill.exec.proto.UserProtos.TableMetadata; +import org.apache.drill.exec.rpc.Response; +import org.apache.drill.exec.rpc.ResponseSender; +import org.apache.drill.exec.rpc.user.UserServer.UserClientConnection; +import org.apache.drill.exec.rpc.user.UserSession; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.server.options.OptionValue; +import org.apache.drill.exec.store.SchemaConfig.SchemaConfigInfoProvider; +import org.apache.drill.exec.store.SchemaTreeProvider; +import org.apache.drill.exec.store.ischema.InfoSchemaConstants; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.ConstantExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.ExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.FieldExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.FunctionExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaTableType; +import org.apache.drill.exec.store.ischema.Records.Catalog; +import org.apache.drill.exec.store.ischema.Records.Column; +import org.apache.drill.exec.store.ischema.Records.Schema; +import org.apache.drill.exec.store.ischema.Records.Table; +import org.apache.drill.exec.store.pojo.PojoRecordReader; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; + +/** + * Contains worker {@link Runnable} classes for providing the metadata and related helper methods. + */ +public class MetadataProvider { + private static final org.slf4j.Logger logger =
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73769668 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/metadata/MetadataProvider.java --- @@ -0,0 +1,451 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.work.metadata; + +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.CATS_COL_CATALOG_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SCHS_COL_SCHEMA_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SHRD_COL_TABLE_NAME; +import static org.apache.drill.exec.store.ischema.InfoSchemaConstants.SHRD_COL_TABLE_SCHEMA; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.CATALOGS; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.COLUMNS; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.SCHEMATA; +import static org.apache.drill.exec.store.ischema.InfoSchemaTableType.TABLES; + +import java.util.UUID; + +import org.apache.calcite.schema.SchemaPlus; +import org.apache.drill.common.exceptions.ErrorHelper; +import org.apache.drill.exec.ops.ViewExpansionContext; +import org.apache.drill.exec.proto.UserBitShared.DrillPBError; +import org.apache.drill.exec.proto.UserBitShared.DrillPBError.ErrorType; +import org.apache.drill.exec.proto.UserProtos.CatalogMetadata; +import org.apache.drill.exec.proto.UserProtos.ColumnMetadata; +import org.apache.drill.exec.proto.UserProtos.GetCatalogsResp; +import org.apache.drill.exec.proto.UserProtos.GetCatalogsReq; +import org.apache.drill.exec.proto.UserProtos.GetColumnsReq; +import org.apache.drill.exec.proto.UserProtos.GetColumnsResp; +import org.apache.drill.exec.proto.UserProtos.GetSchemasReq; +import org.apache.drill.exec.proto.UserProtos.GetSchemasResp; +import org.apache.drill.exec.proto.UserProtos.GetTablesReq; +import org.apache.drill.exec.proto.UserProtos.GetTablesResp; +import org.apache.drill.exec.proto.UserProtos.LikeFilter; +import org.apache.drill.exec.proto.UserProtos.RequestStatus; +import org.apache.drill.exec.proto.UserProtos.RpcType; +import org.apache.drill.exec.proto.UserProtos.SchemaMetadata; +import org.apache.drill.exec.proto.UserProtos.TableMetadata; +import org.apache.drill.exec.rpc.Response; +import org.apache.drill.exec.rpc.ResponseSender; +import org.apache.drill.exec.rpc.user.UserServer.UserClientConnection; +import org.apache.drill.exec.rpc.user.UserSession; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.server.options.OptionValue; +import org.apache.drill.exec.store.SchemaConfig.SchemaConfigInfoProvider; +import org.apache.drill.exec.store.SchemaTreeProvider; +import org.apache.drill.exec.store.ischema.InfoSchemaConstants; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.ConstantExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.ExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.FieldExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaFilter.FunctionExprNode; +import org.apache.drill.exec.store.ischema.InfoSchemaTableType; +import org.apache.drill.exec.store.ischema.Records.Catalog; +import org.apache.drill.exec.store.ischema.Records.Column; +import org.apache.drill.exec.store.ischema.Records.Schema; +import org.apache.drill.exec.store.ischema.Records.Table; +import org.apache.drill.exec.store.pojo.PojoRecordReader; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; + +/** + * Contains worker {@link Runnable} classes for providing the metadata and related helper methods. + */ +public class MetadataProvider { + private static final org.slf4j.Logger logger =
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73769636 --- Diff: exec/jdbc-all/pom.xml --- @@ -441,7 +441,7 @@ This is likely due to you adding new dependencies to a java-exec and not updating the excludes in this module. This is important as it minimizes the size of the dependency of Drill application users. - 2000 + 2100 --- End diff -- The size increase is due to new protobuf files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-4831) Running refresh table metadata concurrently randomly fails with JsonParseException
Rahul Challapalli created DRILL-4831: Summary: Running refresh table metadata concurrently randomly fails with JsonParseException Key: DRILL-4831 URL: https://issues.apache.org/jira/browse/DRILL-4831 Project: Apache Drill Issue Type: Bug Components: Metadata Affects Versions: 1.8.0 Reporter: Rahul Challapalli git.commit.id.abbrev=f476eb5 Just run the below command concurrently from 10 different JDBC connections. There is a likelihood that you might encounter the below error Extracts from the log {code} Caused By (java.lang.AssertionError) Internal error: Error while applying rule DrillPushProjIntoScan, args [rel#189411:LogicalProject.NONE.ANY([]).[](input=rel#189289:Subset#3.ENUMERABLE.ANY([]).[],l_orderkey=$1,dir0=$2,dir1=$3,dir2=$4,l_shipdate=$5,l_extendedprice=$6,l_discount=$7), rel#189233:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, metadata_caching_pp, l_3level])] org.apache.calcite.util.Util.newInternal():792 org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 . . java.lang.Thread.run():745 Caused By (org.apache.drill.common.exceptions.DrillRuntimeException) com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens at [Source: com.mapr.fs.MapRFsDataInputStream@57a574a8; line: 1, column: 2] org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch():95 {code} Attached the complete log message and the data set -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill issue #556: DRILL-4819 - update MapR version to 5.2.0
Github user adityakishore commented on the issue: https://github.com/apache/drill/pull/556 LGTM. +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73767987 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -411,6 +420,98 @@ public void runQuery(QueryType type, List planFragments, UserResul } /** + * Get the list of catalogs in INFORMATION_SCHEMA.CATALOGS table satisfying the given filters. + * + * @param catalogNameFilter Filter on catalog name. Pass null to apply no filter. + * @return + */ + public DrillRpcFuture getCatalogs(LikeFilter catalogNameFilter) { --- End diff -- These APIs are supposed to be fast, not sure implementing cancel provides any value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73767794 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -411,6 +420,98 @@ public void runQuery(QueryType type, List planFragments, UserResul } /** + * Get the list of catalogs in INFORMATION_SCHEMA.CATALOGS table satisfying the given filters. + * + * @param catalogNameFilter Filter on catalog name. Pass null to apply no filter. + * @return + */ + public DrillRpcFuture getCatalogs(LikeFilter catalogNameFilter) { --- End diff -- `DrillRpcFuture` allows for cancellation (through `cancel(boolean mayInterruptIfRunning)`). Do we ignore the cancellation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73767561 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/pojo/PojoRecordReader.java --- @@ -47,24 +46,30 @@ import org.apache.drill.exec.vector.AllocationHelper; import org.apache.drill.exec.vector.ValueVector; +import com.google.common.collect.ImmutableList; import com.google.common.collect.Lists; -public class PojoRecordReader extends AbstractRecordReader { +public class PojoRecordReader extends AbstractRecordReader implements Iterable { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(PojoRecordReader.class); private static final ControlsInjector injector = ControlsInjectorFactory.getInjector(PojoRecordReader.class); - public final int forJsonIgnore = 1; - private final Class pojoClass; - private final Iterator iterator; + private final List pojoObjects; private PojoWriter[] writers; private boolean doCurrent; private T currentPojo; private OperatorContext operatorContext; + private Iterator currentIterator; + + /** + * TODO: Cleanup the callers to pass the List of POJO objects directly rather than iterator. + * @param pojoClass + * @param iterator + */ public PojoRecordReader(Class pojoClass, Iterator iterator) { this.pojoClass = pojoClass; -this.iterator = iterator; +this.pojoObjects = ImmutableList.copyOf(iterator); --- End diff -- I looked; most of the calls are `list.iterator()`, beats the purpose. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73767470 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -0,0 +1,105 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import org.apache.calcite.jdbc.SimpleCalciteSchema; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.drill.common.AutoCloseables; +import org.apache.drill.common.exceptions.DrillRuntimeException; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.SchemaConfig.SchemaConfigInfoProvider; +import org.apache.drill.exec.util.ImpersonationUtil; + +import com.google.common.collect.Lists; + +import java.io.IOException; +import java.util.List; + +/** + * Class which creates new schema trees. It keeps track of newly created schema trees and closes them safely as + * part of {@link #close()}. + */ +public class SchemaTreeProvider implements AutoCloseable { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(SchemaTreeProvider.class); + + private final DrillbitContext dContext; + private final List schemaTreesToClose; + private final boolean isImpersonationEnabled; + + public SchemaTreeProvider(final DrillbitContext dContext) { +this.dContext = dContext; +schemaTreesToClose = Lists.newArrayList(); +isImpersonationEnabled = dContext.getConfig().getBoolean(ExecConstants.IMPERSONATION_ENABLED); + } + + /** + * Return root schema with schema owner as the given user. + * + * @param userName Name of the user who is accessing the storage sources. + * @param provider {@link SchemaConfigInfoProvider} instance + * @return Root of the schema tree. + */ + public SchemaPlus getRootSchema(final String userName, final SchemaConfigInfoProvider provider) { --- End diff -- Changed both methods to use createRootSchema --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73767387 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java --- @@ -0,0 +1,105 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import org.apache.calcite.jdbc.SimpleCalciteSchema; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.drill.common.AutoCloseables; +import org.apache.drill.common.exceptions.DrillRuntimeException; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.SchemaConfig.SchemaConfigInfoProvider; +import org.apache.drill.exec.util.ImpersonationUtil; + +import com.google.common.collect.Lists; + +import java.io.IOException; +import java.util.List; + +/** + * Class which creates new schema trees. It keeps track of newly created schema trees and closes them safely as + * part of {@link #close()}. + */ +public class SchemaTreeProvider implements AutoCloseable { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(SchemaTreeProvider.class); + + private final DrillbitContext dContext; + private final List schemaTreesToClose; + private final boolean isImpersonationEnabled; + + public SchemaTreeProvider(final DrillbitContext dContext) { +this.dContext = dContext; +schemaTreesToClose = Lists.newArrayList(); +isImpersonationEnabled = dContext.getConfig().getBoolean(ExecConstants.IMPERSONATION_ENABLED); + } + + /** + * Return root schema with schema owner as the given user. + * + * @param userName Name of the user who is accessing the storage sources. + * @param provider {@link SchemaConfigInfoProvider} instance + * @return Root of the schema tree. + */ + public SchemaPlus getRootSchema(final String userName, final SchemaConfigInfoProvider provider) { +final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName(); +final SchemaConfig schemaConfig = SchemaConfig.newBuilder(schemaUser, provider).build(); +return getRootSchema(schemaConfig); + } + + /** + * Create and return a SchemaTree with given schemaConfig. + * @param schemaConfig + * @return + */ + public SchemaPlus getRootSchema(SchemaConfig schemaConfig) { +try { + final SchemaPlus rootSchema = SimpleCalciteSchema.createRootSchema(false); + dContext.getSchemaFactory().registerSchemas(schemaConfig, rootSchema); + schemaTreesToClose.add(rootSchema); + return rootSchema; +} catch(IOException e) { + // We can't proceed further without a schema, throw a runtime exception. + final String errMsg = String.format("Failed to create schema tree: %s", e.getMessage()); + logger.error(errMsg, e); + throw new DrillRuntimeException(errMsg, e); --- End diff -- updated to use UserException --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73767170 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/pojo/PojoRecordReader.java --- @@ -47,24 +46,30 @@ import org.apache.drill.exec.vector.AllocationHelper; import org.apache.drill.exec.vector.ValueVector; +import com.google.common.collect.ImmutableList; import com.google.common.collect.Lists; -public class PojoRecordReader extends AbstractRecordReader { +public class PojoRecordReader extends AbstractRecordReader implements Iterable { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(PojoRecordReader.class); private static final ControlsInjector injector = ControlsInjectorFactory.getInjector(PojoRecordReader.class); - public final int forJsonIgnore = 1; - private final Class pojoClass; - private final Iterator iterator; + private final List pojoObjects; private PojoWriter[] writers; private boolean doCurrent; private T currentPojo; private OperatorContext operatorContext; + private Iterator currentIterator; + + /** + * TODO: Cleanup the callers to pass the List of POJO objects directly rather than iterator. + * @param pojoClass + * @param iterator + */ public PojoRecordReader(Class pojoClass, Iterator iterator) { this.pojoClass = pojoClass; -this.iterator = iterator; +this.pojoObjects = ImmutableList.copyOf(iterator); --- End diff -- We already create a list and pass the iterator. So we are not losing any lazyness. Are there any places where we populate when iterator is accessed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73766580 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/pojo/PojoRecordReader.java --- @@ -47,24 +46,30 @@ import org.apache.drill.exec.vector.AllocationHelper; import org.apache.drill.exec.vector.ValueVector; +import com.google.common.collect.ImmutableList; import com.google.common.collect.Lists; -public class PojoRecordReader extends AbstractRecordReader { +public class PojoRecordReader extends AbstractRecordReader implements Iterable { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(PojoRecordReader.class); private static final ControlsInjector injector = ControlsInjectorFactory.getInjector(PojoRecordReader.class); - public final int forJsonIgnore = 1; - private final Class pojoClass; - private final Iterator iterator; + private final List pojoObjects; private PojoWriter[] writers; private boolean doCurrent; private T currentPojo; private OperatorContext operatorContext; + private Iterator currentIterator; + + /** + * TODO: Cleanup the callers to pass the List of POJO objects directly rather than iterator. + * @param pojoClass + * @param iterator + */ public PojoRecordReader(Class pojoClass, Iterator iterator) { this.pojoClass = pojoClass; -this.iterator = iterator; +this.pojoObjects = ImmutableList.copyOf(iterator); --- End diff -- We lose laziness by doing this. I am not sure if queries will be impacted (queries on information schema and sys tables), since this will be done at setup time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #517: DRILL-4704 fix
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/517 As a next step, using Dave's solution would better handle truncation: based on the precision of the input value, adjust the scale to truncate low-order digits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #517: DRILL-4704 fix
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/517 Based on above discussion, I played around with the proposed solution. The following takes into consideration the max precision of both the input and output types. It DOES NOT handle truncation properly (but nor do any of the other solutions.) Here is the revised code: if (precision.value <= 0) { <#-- // since input precision is nonpositive, calculate precision for this integer value long precisionTmp = in.value == 0 ? 1 : in.value; // precision for value 0 will be 1 int precisionCounter = 0; while (precisionTmp != 0) { ++precisionCounter; precisionTmp /= 10; } out.precision = precisionCounter; --> // Since precision is not provided, select one appropriate for the // largest integer value and the target decimal type. // Note: if the integer is too large for the target precision, silent // truncation of the highest part of the number will occur. This is // obviously bad. What should happen is we shift scale and discard // the low-order digits. // That is, if we convert 123,456,789,012 to Dec9, the result is // 456,789,012 which is very obviously very wrong. It should be // 123,456,789,000. <#-- A correct solution requires a modification of the code commented out above. --> <#-- Precision needed by the largest from type value. --> <#if type.from.equals( "Int" )> <#assign inPrec = 10> <#elseif type.from.equals( "BigInt" )> <#assign inPrec = 19> <#else> <#-- Yes, create invalid syntax: that way the compiler will tell us we missed a type. --> <#assign inPrec = "Unexpected from type: ${type.from}"> <#-- Maximum precision allowed by the to type. --> <#if type.to.startsWith("Decimal9")> <#assign maxPrec = 9> <#elseif type.to.startsWith("Decimal18")> <#assign maxPrec = 18> <#elseif type.to.startsWith("Decimal28")> <#assign maxPrec = 28> <#elseif type.to.startsWith("Decimal38")> <#assign maxPrec = 38> <#else> <#-- Yes, create invalid syntax: that way the compiler will tell us we missed a type. --> <#assign maxPrec = "Unexpected to type: ${type.to}"> <#-- Note that this calculation is done here, rather than in static variables, because static members are not allowed for this function. --> <#if inPrec < maxPrec> // Maximum precision needed by the largest ${type.from} value. out.precision = ${inPrec}; <#else> // Maximum precision allowed by the ${type.to} type. out.precision = ${maxPrec}; } else { // since input precision is positive, assume it is correct, and use it out.precision = (int) precision.value; out.scale = (int) scale.value; } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #517: DRILL-4704 fix
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/517 Hi Dave, I agree that Drill Decimal support clearly needs attention. I'm expermenting with adjusting the template code to at least generate a meaningful conversion for all int x decimal types when the values fit. Looks like current behavior will be that if we convert an int that is too large for a scale of 0, we won't adjust the scale. That is, if we always leave scale=0, then we'll do the following obviously wrong converstion: big int: 123,456,789,012 Dec 9: 456,789,012 We want a scale of 3, so that the answer is: Dec 9: 123,456,789,000 And, of course, we should round the least significant digit, which it seems we don't do. So, our fix will work only for the simple case, not for the more complex case. We could leverage your fix to properly set the scale as well as the precision. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73764733 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java --- @@ -97,7 +94,7 @@ public QueryContext(final UserSession session, final DrillbitContext drillbitCon plannerSettings.getPlanningMemoryLimit()); bufferManager = new BufferManagerImpl(this.allocator); viewExpansionContext = new ViewExpansionContext(this); -schemaTreesToClose = Lists.newArrayList(); +schemaTreeProvider = new SchemaTreeProvider(drillbitContext); --- End diff -- yes. It contains the schema trees that are created within the query. It is going to differ for every query depending upon what schemas are added. Also it is a very light weight object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #527: DRILL-4728: Add support for new metadata fetch APIs
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/527#discussion_r73764478 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java --- @@ -97,7 +94,7 @@ public QueryContext(final UserSession session, final DrillbitContext drillbitCon plannerSettings.getPlanningMemoryLimit()); bufferManager = new BufferManagerImpl(this.allocator); viewExpansionContext = new ViewExpansionContext(this); -schemaTreesToClose = Lists.newArrayList(); +schemaTreeProvider = new SchemaTreeProvider(drillbitContext); --- End diff -- Does this need to be created once per query? Can one instance live in Drillbit or UserSession? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-4830) We are reading from sub-directory cache when we have a view
Rahul Challapalli created DRILL-4830: Summary: We are reading from sub-directory cache when we have a view Key: DRILL-4830 URL: https://issues.apache.org/jira/browse/DRILL-4830 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.8.0 Reporter: Rahul Challapalli git.commit.id.abbrev=f476eb5 The below plan suggests we are not reading from the sub-directory cache file {code} create or replace view l1 as select dir0 num, substr(dir1, 1, 2) let, extract(day from dir2) `day`, l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedorice, l_discount, l_tax from l_3level; +---+--+ | ok | summary| +---+--+ | true | View 'l1' replaced successfully in 'dfs.metadata_caching_pp' schema | +---+--+ 1 row selected (0.355 seconds) explain plan for select num, let, `day`, l_orderkey from l2 where num=2 and let='tw' and `day` = 12; +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(num=[$0], let=[$1], day=[$2], l_orderkey=[$3]) 00-02Project(num=[$0], let=[SUBSTR($1, 1, 2)], day=[EXTRACT(FLAG(DAY), $2)], l_orderkey=[$3]) 00-03 SelectionVectorRemover 00-04Filter(condition=[AND(=($0, 2), =(SUBSTR($1, 1, 2), 'tw'), =(EXTRACT(FLAG(DAY), $2), 12))]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/metadata_caching_pp/l_3level/2/two/2015-8-12/40.parquet], ReadEntryWithPath [path=/drill/testdata/metadata_caching_pp/l_3level/2/two/2015-9-12/50.parquet], ReadEntryWithPath [path=/drill/testdata/metadata_caching_pp/l_3level/2/two/2015-7-12/30.parquet]], selectionRoot=/drill/testdata/metadata_caching_pp/l_3level, numFiles=3, usedMetadataFile=true, cacheFileRoot=/drill/testdata/metadata_caching_pp/l_3level/2/two, columns=[`dir0`, `dir1`, `dir2`, `l_orderkey`]]]) {code} I attached the data set required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/520#discussion_r73758968 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java --- @@ -219,11 +226,26 @@ public boolean storesMixedCaseQuotedIdentifiers() throws SQLException { return super.storesMixedCaseQuotedIdentifiers(); } - // TODO(DRILL-3510): Update when Drill accepts standard SQL's double quote. @Override public String getIdentifierQuoteString() throws SQLException { throwIfClosed(); -return "`"; +boolean systemOption = false; +boolean sessionOption = false; +String sql = "select type, bool_val from sys.options where name = 'parser.ansi_quotes'"; +ResultSet rs = executeSql(sql); +while (rs.next()) { + if (rs.getString(1).equals("SYSTEM")) { +systemOption = rs.getBoolean(2); + } + if (rs.getString(1).equals("SESSION")) { +sessionOption = rs.getBoolean(2); + } +} +if (systemOption || sessionOption) { --- End diff -- Missed it. Is corrected in a new commit. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/520#discussion_r73758663 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java --- @@ -219,11 +226,26 @@ public boolean storesMixedCaseQuotedIdentifiers() throws SQLException { return super.storesMixedCaseQuotedIdentifiers(); } - // TODO(DRILL-3510): Update when Drill accepts standard SQL's double quote. @Override public String getIdentifierQuoteString() throws SQLException { throwIfClosed(); -return "`"; +boolean systemOption = false; +boolean sessionOption = false; +String sql = "select type, bool_val from sys.options where name = 'parser.ansi_quotes'"; --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/520#discussion_r73758593 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java --- @@ -36,6 +42,7 @@ */ class DrillDatabaseMetaDataImpl extends AvaticaDatabaseMetaData --- End diff -- A feature to set on the session option ANSI_QUOTES via jdbc connection string is added in a new commit. For example: `"jdbc:drill:zk=local;ansi_quotes=true"` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/520#discussion_r73757533 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java --- @@ -139,6 +151,24 @@ public SqlNode parse(String sql) { SqlParser parser = SqlParser.create(sql, parserConfig); return parser.parseStmt(); } catch (SqlParseException e) { + + // Attempt to use default back_tick quote character for identifiers when --- End diff -- One of the requirements was still working identifiers with backticks even while ANSI_QUOTES mode is on. `sql.contains("`")` condition isn't mandatory here. It was removed in a new commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/520#discussion_r73756251 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java --- @@ -61,7 +61,8 @@ public static PhysicalPlan getPlan(QueryContext context, String sql, Pointer
[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...
Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/520#discussion_r73755627 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -274,6 +274,9 @@ String ENABLE_BULK_LOAD_TABLE_LIST_KEY = "exec.enable_bulk_load_table_list"; BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, false); + String ANSI_QUOTES_KEY = "parser.ansi_quotes"; --- End diff -- I took that name from the old patch on this jira ticket. Completely agree with a new name. Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Time for a 1.8 release
I will try to get these sometime today or eod tomorrow. 1) DRILL-4728: Add support for new metadata fetch APIs, https://github.com/apache/drill/pull/527 (Ready to commit after few minor changes) 2) DRILL-4729: Add support for prepared statement API on server side, https://github.com/apache/drill/pull/530 (need to address a review comment) 3) DRILL-4732: Update JDBC driver to use new prepared statement API in DrillClient, https://github.com/apache/drill/pull/532 (need to address a comment). Thanks Venki On Fri, Aug 5, 2016 at 1:31 PM, Sudheesh Katkamwrote: > Yes, that’s a typo. DRILL-3623 was for LIMIT 0 changes. That number will > forever remain in my subconscious ;) > > Thank you, > Sudheesh > > > On Aug 5, 2016, at 1:27 PM, Zelaine Fong wrote: > > > > I assume #3 has a typo. It should be DRILL-4623. > > > > -- Zelaine > > > > On Fri, Aug 5, 2016 at 1:15 PM, Sudheesh Katkam > wrote: > > > >> Here’s my list that I will try to get in before Monday. > >> > >> 1) DRILL-4822: Pending commit. > >> PR: https://github.com/apache/drill/pull/558 > >> > >> 2) DRILL-4819: Pending review from Aditya. > >> PR: https://github.com/apache/drill/pull/556 > >> > >> 3) DRILL-3623: Pending tests (or test results). > >> PR: https://github.com/apache/drill/pull/486 > >> > >> 4) DRILL-4792: Pending changes from Arina. > >> PR: https://github.com/apache/drill/pull/551 > >> > >> The first three will make it; the last one may not. > >> > >> Thank you, > >> Sudheesh > >> > >> > >> On Wed, Aug 3, 2016 at 11:49 AM, Jinfeng Ni > wrote: > >> > >>> I would like to propose we set Monday 8/8 as the tentative cut-off > >>> date for 1.8 release. > >>> > >>> If people have any working-in-progress patches, and would like to > >>> include in 1.8 release, please submit PR, or ping people to review > >>> your PR if there has been a PR under review. > >>> > >>> Thanks, > >>> > >>> > >>> > >>> > >>> On Tue, Aug 2, 2016 at 3:00 PM, Jinfeng Ni > >> wrote: > Hello everyone, > > I would like to start the train for 1.8 release. > > If you have been working on some open issues, and want to include > them in 1.8, please rely this email and let us know the JIRAs number > and corresponding pull requests. > > Based on the response, we may go through the list of opens JIRAs for > 1.8, and come up with a tentatively cut-off date for 1.8 release. > > Thanks, > > > > On Tue, Aug 2, 2016 at 2:14 PM, Aditya wrote: > > +1 for Jinfeng as the RM for 1.8 release. > > > > On Tue, Aug 2, 2016 at 11:59 AM, Jinfeng Ni > >>> wrote: > > > >> I'll volunteer to be the release manager for 1.8, if no one else > >> would > >> like to do so. > >> > >> > >> > >> > >> On Mon, Aug 1, 2016 at 9:40 PM, Parth Chandra > >>> wrote: > >>> Hi Everyone, > >>> > >>> I think its time to roll out 1.8. Would any PMC > >> member/committer > >> like > >>> to volunteer to be the release manager? > >>> > >>> Parth > >> > >>> > >> > >
Re: Time for a 1.8 release
Yes, that’s a typo. DRILL-3623 was for LIMIT 0 changes. That number will forever remain in my subconscious ;) Thank you, Sudheesh > On Aug 5, 2016, at 1:27 PM, Zelaine Fongwrote: > > I assume #3 has a typo. It should be DRILL-4623. > > -- Zelaine > > On Fri, Aug 5, 2016 at 1:15 PM, Sudheesh Katkam wrote: > >> Here’s my list that I will try to get in before Monday. >> >> 1) DRILL-4822: Pending commit. >> PR: https://github.com/apache/drill/pull/558 >> >> 2) DRILL-4819: Pending review from Aditya. >> PR: https://github.com/apache/drill/pull/556 >> >> 3) DRILL-3623: Pending tests (or test results). >> PR: https://github.com/apache/drill/pull/486 >> >> 4) DRILL-4792: Pending changes from Arina. >> PR: https://github.com/apache/drill/pull/551 >> >> The first three will make it; the last one may not. >> >> Thank you, >> Sudheesh >> >> >> On Wed, Aug 3, 2016 at 11:49 AM, Jinfeng Ni wrote: >> >>> I would like to propose we set Monday 8/8 as the tentative cut-off >>> date for 1.8 release. >>> >>> If people have any working-in-progress patches, and would like to >>> include in 1.8 release, please submit PR, or ping people to review >>> your PR if there has been a PR under review. >>> >>> Thanks, >>> >>> >>> >>> >>> On Tue, Aug 2, 2016 at 3:00 PM, Jinfeng Ni >> wrote: Hello everyone, I would like to start the train for 1.8 release. If you have been working on some open issues, and want to include them in 1.8, please rely this email and let us know the JIRAs number and corresponding pull requests. Based on the response, we may go through the list of opens JIRAs for 1.8, and come up with a tentatively cut-off date for 1.8 release. Thanks, On Tue, Aug 2, 2016 at 2:14 PM, Aditya wrote: > +1 for Jinfeng as the RM for 1.8 release. > > On Tue, Aug 2, 2016 at 11:59 AM, Jinfeng Ni >>> wrote: > >> I'll volunteer to be the release manager for 1.8, if no one else >> would >> like to do so. >> >> >> >> >> On Mon, Aug 1, 2016 at 9:40 PM, Parth Chandra >>> wrote: >>> Hi Everyone, >>> >>> I think its time to roll out 1.8. Would any PMC >> member/committer >> like >>> to volunteer to be the release manager? >>> >>> Parth >> >>> >>
[jira] [Resolved] (DRILL-4825) Wrong data with UNION ALL when querying different sub-directories under the same table
[ https://issues.apache.org/jira/browse/DRILL-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni resolved DRILL-4825. --- Resolution: Fixed > Wrong data with UNION ALL when querying different sub-directories under the > same table > -- > > Key: DRILL-4825 > URL: https://issues.apache.org/jira/browse/DRILL-4825 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.6.0, 1.7.0, 1.8.0 >Reporter: Rahul Challapalli >Assignee: Jinfeng Ni >Priority: Critical > Fix For: 1.8.0 > > Attachments: l_3level.tgz > > > git.commit.id.abbrev=0700c6b > The below query returns wrongs results > {code} > select count (*) from ( > select l_orderkey, dir0 from l_3level t1 where t1.dir0 = 1 and > t1.dir1='one' and t1.dir2 = '2015-7-12' > union all > select l_orderkey, dir0 from l_3level t2 where t2.dir0 = 1 and > t2.dir1='two' and t2.dir2 = '2015-8-12') data; > +-+ > | EXPR$0 | > +-+ > | 20 | > +-+ > {code} > The wrong result is evident from the output of the below queries > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> select count (*) from (select > l_orderkey, dir0 from l_3level t2 where t2.dir0 = 1 and t2.dir1='two' and > t2.dir2 = '2015-8-12'); > +-+ > | EXPR$0 | > +-+ > | 30 | > +-+ > 1 row selected (0.258 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> select count (*) from (select > l_orderkey, dir0 from l_3level t2 where t2.dir0 = 1 and t2.dir1='one' and > t2.dir2 = '2015-7-12'); > +-+ > | EXPR$0 | > +-+ > | 10 | > +-+ > {code} > I attached the data set. Let me know if you need anything more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/559 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Time for a 1.8 release
I assume #3 has a typo. It should be DRILL-4623. -- Zelaine On Fri, Aug 5, 2016 at 1:15 PM, Sudheesh Katkamwrote: > Here’s my list that I will try to get in before Monday. > > 1) DRILL-4822: Pending commit. > PR: https://github.com/apache/drill/pull/558 > > 2) DRILL-4819: Pending review from Aditya. > PR: https://github.com/apache/drill/pull/556 > > 3) DRILL-3623: Pending tests (or test results). > PR: https://github.com/apache/drill/pull/486 > > 4) DRILL-4792: Pending changes from Arina. > PR: https://github.com/apache/drill/pull/551 > > The first three will make it; the last one may not. > > Thank you, > Sudheesh > > > On Wed, Aug 3, 2016 at 11:49 AM, Jinfeng Ni wrote: > > > I would like to propose we set Monday 8/8 as the tentative cut-off > > date for 1.8 release. > > > > If people have any working-in-progress patches, and would like to > > include in 1.8 release, please submit PR, or ping people to review > > your PR if there has been a PR under review. > > > > Thanks, > > > > > > > > > > On Tue, Aug 2, 2016 at 3:00 PM, Jinfeng Ni > wrote: > > > Hello everyone, > > > > > > I would like to start the train for 1.8 release. > > > > > > If you have been working on some open issues, and want to include > > > them in 1.8, please rely this email and let us know the JIRAs number > > > and corresponding pull requests. > > > > > > Based on the response, we may go through the list of opens JIRAs for > > > 1.8, and come up with a tentatively cut-off date for 1.8 release. > > > > > > Thanks, > > > > > > > > > > > > On Tue, Aug 2, 2016 at 2:14 PM, Aditya wrote: > > >> +1 for Jinfeng as the RM for 1.8 release. > > >> > > >> On Tue, Aug 2, 2016 at 11:59 AM, Jinfeng Ni > > wrote: > > >> > > >>> I'll volunteer to be the release manager for 1.8, if no one else > would > > >>> like to do so. > > >>> > > >>> > > >>> > > >>> > > >>> On Mon, Aug 1, 2016 at 9:40 PM, Parth Chandra > > wrote: > > >>> > Hi Everyone, > > >>> > > > >>> >I think its time to roll out 1.8. Would any PMC > member/committer > > >>> like > > >>> > to volunteer to be the release manager? > > >>> > > > >>> > Parth > > >>> > > >
Re: Time for a 1.8 release
Here’s my list that I will try to get in before Monday. 1) DRILL-4822: Pending commit. PR: https://github.com/apache/drill/pull/558 2) DRILL-4819: Pending review from Aditya. PR: https://github.com/apache/drill/pull/556 3) DRILL-3623: Pending tests (or test results). PR: https://github.com/apache/drill/pull/486 4) DRILL-4792: Pending changes from Arina. PR: https://github.com/apache/drill/pull/551 The first three will make it; the last one may not. Thank you, Sudheesh On Wed, Aug 3, 2016 at 11:49 AM, Jinfeng Niwrote: > I would like to propose we set Monday 8/8 as the tentative cut-off > date for 1.8 release. > > If people have any working-in-progress patches, and would like to > include in 1.8 release, please submit PR, or ping people to review > your PR if there has been a PR under review. > > Thanks, > > > > > On Tue, Aug 2, 2016 at 3:00 PM, Jinfeng Ni wrote: > > Hello everyone, > > > > I would like to start the train for 1.8 release. > > > > If you have been working on some open issues, and want to include > > them in 1.8, please rely this email and let us know the JIRAs number > > and corresponding pull requests. > > > > Based on the response, we may go through the list of opens JIRAs for > > 1.8, and come up with a tentatively cut-off date for 1.8 release. > > > > Thanks, > > > > > > > > On Tue, Aug 2, 2016 at 2:14 PM, Aditya wrote: > >> +1 for Jinfeng as the RM for 1.8 release. > >> > >> On Tue, Aug 2, 2016 at 11:59 AM, Jinfeng Ni > wrote: > >> > >>> I'll volunteer to be the release manager for 1.8, if no one else would > >>> like to do so. > >>> > >>> > >>> > >>> > >>> On Mon, Aug 1, 2016 at 9:40 PM, Parth Chandra > wrote: > >>> > Hi Everyone, > >>> > > >>> >I think its time to roll out 1.8. Would any PMC member/committer > >>> like > >>> > to volunteer to be the release manager? > >>> > > >>> > Parth > >>> >
Re: Time for a 1.8 release
Just a reminder that if you want to include work-in-progress/under-review patch in 1.8 release, please reply to this email and get it reviewed asap. The tentative cut-off day is Monday. On Wed, Aug 3, 2016 at 11:49 AM, Jinfeng Niwrote: > I would like to propose we set Monday 8/8 as the tentative cut-off > date for 1.8 release. > > If people have any working-in-progress patches, and would like to > include in 1.8 release, please submit PR, or ping people to review > your PR if there has been a PR under review. > > Thanks, > > > > > On Tue, Aug 2, 2016 at 3:00 PM, Jinfeng Ni wrote: >> Hello everyone, >> >> I would like to start the train for 1.8 release. >> >> If you have been working on some open issues, and want to include >> them in 1.8, please rely this email and let us know the JIRAs number >> and corresponding pull requests. >> >> Based on the response, we may go through the list of opens JIRAs for >> 1.8, and come up with a tentatively cut-off date for 1.8 release. >> >> Thanks, >> >> >> >> On Tue, Aug 2, 2016 at 2:14 PM, Aditya wrote: >>> +1 for Jinfeng as the RM for 1.8 release. >>> >>> On Tue, Aug 2, 2016 at 11:59 AM, Jinfeng Ni wrote: >>> I'll volunteer to be the release manager for 1.8, if no one else would like to do so. On Mon, Aug 1, 2016 at 9:40 PM, Parth Chandra wrote: > Hi Everyone, > >I think its time to roll out 1.8. Would any PMC member/committer like > to volunteer to be the release manager? > > Parth
[GitHub] drill issue #559: DRILL-4825: Fix incorrect result issue caused by partition...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/559 LGTM. +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #559: DRILL-4825: Fix incorrect result issue caused by partition...
Github user jinfengni commented on the issue: https://github.com/apache/drill/pull/559 @amansinha100 , thanks for your comments. Could you please take another look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/559#discussion_r73725525 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java --- @@ -386,4 +386,49 @@ public void testPartitionFilterWithInSubquery() throws Exception { test("alter session set `planner.in_subquery_threshold` = 10"); testExcludeFilter(query, 4, "Filter", 40); } + + + @Test // DRILL-4825: querying same table with different filter in UNION ALL. + public void testPruneSameTableInUnionAll() throws Exception { +final String query = String.format("select count(*) as cnt from " ++ "( select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') union all " ++ " select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996') )", +TEST_RES_PATH, TEST_RES_PATH); + +String [] exclued = {"Filter"}; --- End diff -- modified. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/559#discussion_r73725537 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java --- @@ -386,4 +386,49 @@ public void testPartitionFilterWithInSubquery() throws Exception { test("alter session set `planner.in_subquery_threshold` = 10"); testExcludeFilter(query, 4, "Filter", 40); } + + + @Test // DRILL-4825: querying same table with different filter in UNION ALL. + public void testPruneSameTableInUnionAll() throws Exception { +final String query = String.format("select count(*) as cnt from " ++ "( select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') union all " ++ " select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996') )", +TEST_RES_PATH, TEST_RES_PATH); + +String [] exclued = {"Filter"}; + +// verify plan that filter is applied in partition pruning. +testPlanMatchingPatterns(query, null, exclued); + +// verify we get correct count(*). +testBuilder() +.sqlQuery(query) +.unOrdered() +.baselineColumns("cnt") +.baselineValues((long)120) +.build() +.run(); + } + + @Test // DRILL-4825: querying same table with different filter in Join. + public void testPruneSameTableInJoin() throws Exception { +final String query = String.format("select * from " ++ "( select sum(o_custkey) as x from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') ) join " ++ " ( select sum(o_custkey) as y from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996')) " ++ " on x = y ", +TEST_RES_PATH, TEST_RES_PATH); + +String [] exclued = {"Filter"}; --- End diff -- modified. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/559#discussion_r73725484 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java --- @@ -402,4 +402,18 @@ public String getCacheFileRoot() { return cacheFileRoot; } + @Override + public String toString() { +final StringBuilder sb = new StringBuilder(); +sb.append("root=" + this.selectionRoot); + +sb.append("files=["); +for (final String file : this.files) { + sb.append(file); --- End diff -- It's used only internally for now, since Drill's Explain output only generates physical plan. You suggestion makes sense. I add a comma after each file name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/559#discussion_r73725279 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DirPrunedEnumerableTableScan.java --- @@ -0,0 +1,73 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.planner.logical; + +import com.google.common.base.Supplier; +import com.google.common.collect.ImmutableList; +import org.apache.calcite.adapter.enumerable.EnumerableConvention; +import org.apache.calcite.adapter.enumerable.EnumerableTableScan; +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelOptTable; +import org.apache.calcite.plan.RelTraitSet; +import org.apache.calcite.rel.RelCollation; +import org.apache.calcite.rel.RelCollationTraitDef; +import org.apache.calcite.rel.RelWriter; +import org.apache.calcite.schema.Table; + +import java.util.List; + +/** + * This class extends from EnumerableTableScan. It puts the file selection string into it's digest. + * When directory-based partition pruning applied, file selection could be different for the same + * table. + */ +public class DirPrunedEnumerableTableScan extends EnumerableTableScan { + private final String digestFromSelection; + + public DirPrunedEnumerableTableScan(RelOptCluster cluster, RelTraitSet traitSet, --- End diff -- Good suggestion. I made the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [Drill-Questions] Speed difference between GZ and BZ2
Yes, i went through the benchmarks and started testing this one. I have tested this one using Hadoop Map-Reduce. And it seems BZ worked faster than GZ. As i know GZ is non-splittable and BZ is splittable. Hadoop MR takes the advantage of this splittable property and launched multiple mappers and reducers (multiple CPU's) whereas in case of GZ only single mapper runs (single CPU) . Can't drill use this splittable property ? On Fri, Aug 5, 2016 at 8:50 PM, Khurram Faraazwrote: > Shankar, > > This is expected behavior, bzip2 decompression is four to twelve times > slower than decompressing gzip compressed files. > You can look at the comparison benchmark here for numbers - > http://tukaani.org/lzma/benchmarks.html > > On Thu, Aug 4, 2016 at 5:13 PM, Shankar Mane > wrote: > > > Please find the query plan for both queries. FYI: I am not seeing > > any planning difference between these 2 queries except Cost. > > > > > > / Query on GZ > > / > > > > 0: jdbc:drill:> explain plan for select channelid, count(serverTime) from > > dfs.`/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz` group by channelid ; > > +--+--+ > > | text | json | > > +--+--+ > > | 00-00Screen > > 00-01 Project(channelid=[$0], EXPR$1=[$1]) > > 00-02UnionExchange > > 01-01 HashAgg(group=[{0}], EXPR$1=[$SUM0($1)]) > > 01-02Project(channelid=[$0], EXPR$1=[$1]) > > 01-03 HashToRandomExchange(dist0=[[$0]]) > > 02-01UnorderedMuxExchange > > 03-01 Project(channelid=[$0], EXPR$1=[$1], > > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > > 03-02HashAgg(group=[{0}], EXPR$1=[COUNT($1)]) > > 03-03 Scan(groupscan=[EasyGroupScan > > [selectionRoot=hdfs://namenode:9000/tmp/stest-gz/ > > kafka_3_25-Jul-2016-12a.json.gz, > > numFiles=1, columns=[`channelid`, `serverTime`], > > files=[hdfs://namenode:9000/tmp/stest-gz/kafka_3_25-Jul- > > 2016-12a.json.gz]]]) > > | { > > "head" : { > > "version" : 1, > > "generator" : { > > "type" : "ExplainHandler", > > "info" : "" > > }, > > "type" : "APACHE_DRILL_PHYSICAL", > > "options" : [ ], > > "queue" : 0, > > "resultMode" : "EXEC" > > }, > > "graph" : [ { > > "pop" : "fs-scan", > > "@id" : 196611, > > "userName" : "hadoop", > > "files" : [ > > "hdfs://namenode:9000/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz" ], > > "storage" : { > > "type" : "file", > > "enabled" : true, > > "connection" : "hdfs://namenode:9000", > > "config" : null, > > "workspaces" : { > > "root" : { > > "location" : "/tmp/", > > "writable" : true, > > "defaultInputFormat" : null > > }, > > "tmp" : { > > "location" : "/tmp", > > "writable" : true, > > "defaultInputFormat" : null > > } > > }, > > "formats" : { > > "psv" : { > > "type" : "text", > > "extensions" : [ "tbl" ], > > "delimiter" : "|" > > }, > > "csv" : { > > "type" : "text", > > "extensions" : [ "csv" ], > > "delimiter" : "," > > }, > > "tsv" : { > > "type" : "text", > > "extensions" : [ "tsv" ], > > "delimiter" : "\t" > > }, > > "parquet" : { > > "type" : "parquet" > > }, > > "json" : { > > "type" : "json", > > "extensions" : [ "json" ] > > }, > > "avro" : { > > "type" : "avro" > > } > > } > > }, > > "format" : { > > "type" : "json", > > "extensions" : [ "json" ] > > }, > > "columns" : [ "`channelid`", "`serverTime`" ], > > "selectionRoot" : > > "hdfs://namenode:9000/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz", > > "cost" : 1800981.0 > > }, { > > "pop" : "hash-aggregate", > > "@id" : 196610, > > "child" : 196611, > > "cardinality" : 1.0, > > "initialAllocation" : 100, > > "maxAllocation" : 100, > > "groupByExprs" : [ { > > "ref" : "`channelid`", > > "expr" : "`channelid`" > > } ], > > "aggrExprs" : [ { > > "ref" : "`EXPR$1`", > > "expr" : "count(`serverTime`) " > > } ], > > "cost" : 900490.5 > > }, { > > "pop" : "project", > > "@id" : 196609, > > "exprs" : [ { > > "ref" : "`channelid`", > > "expr" : "`channelid`" > > }, { > > "ref" : "`EXPR$1`", > > "expr" : "`EXPR$1`" > > }, { > > "ref" : "`E_X_P_R_H_A_S_H_F_I_E_L_D`", > > "expr" : "hash32asdouble(`channelid`) " > > } ], > > "child" : 196610, > > "initialAllocation" : 100, > > "maxAllocation" : 100, > > "cost" : 180098.1 > > }, { > >
Re: [Drill-Questions] Speed difference between GZ and BZ2
Shankar, This is expected behavior, bzip2 decompression is four to twelve times slower than decompressing gzip compressed files. You can look at the comparison benchmark here for numbers - http://tukaani.org/lzma/benchmarks.html On Thu, Aug 4, 2016 at 5:13 PM, Shankar Manewrote: > Please find the query plan for both queries. FYI: I am not seeing > any planning difference between these 2 queries except Cost. > > > / Query on GZ > / > > 0: jdbc:drill:> explain plan for select channelid, count(serverTime) from > dfs.`/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz` group by channelid ; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(channelid=[$0], EXPR$1=[$1]) > 00-02UnionExchange > 01-01 HashAgg(group=[{0}], EXPR$1=[$SUM0($1)]) > 01-02Project(channelid=[$0], EXPR$1=[$1]) > 01-03 HashToRandomExchange(dist0=[[$0]]) > 02-01UnorderedMuxExchange > 03-01 Project(channelid=[$0], EXPR$1=[$1], > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > 03-02HashAgg(group=[{0}], EXPR$1=[COUNT($1)]) > 03-03 Scan(groupscan=[EasyGroupScan > [selectionRoot=hdfs://namenode:9000/tmp/stest-gz/ > kafka_3_25-Jul-2016-12a.json.gz, > numFiles=1, columns=[`channelid`, `serverTime`], > files=[hdfs://namenode:9000/tmp/stest-gz/kafka_3_25-Jul- > 2016-12a.json.gz]]]) > | { > "head" : { > "version" : 1, > "generator" : { > "type" : "ExplainHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ ], > "queue" : 0, > "resultMode" : "EXEC" > }, > "graph" : [ { > "pop" : "fs-scan", > "@id" : 196611, > "userName" : "hadoop", > "files" : [ > "hdfs://namenode:9000/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz" ], > "storage" : { > "type" : "file", > "enabled" : true, > "connection" : "hdfs://namenode:9000", > "config" : null, > "workspaces" : { > "root" : { > "location" : "/tmp/", > "writable" : true, > "defaultInputFormat" : null > }, > "tmp" : { > "location" : "/tmp", > "writable" : true, > "defaultInputFormat" : null > } > }, > "formats" : { > "psv" : { > "type" : "text", > "extensions" : [ "tbl" ], > "delimiter" : "|" > }, > "csv" : { > "type" : "text", > "extensions" : [ "csv" ], > "delimiter" : "," > }, > "tsv" : { > "type" : "text", > "extensions" : [ "tsv" ], > "delimiter" : "\t" > }, > "parquet" : { > "type" : "parquet" > }, > "json" : { > "type" : "json", > "extensions" : [ "json" ] > }, > "avro" : { > "type" : "avro" > } > } > }, > "format" : { > "type" : "json", > "extensions" : [ "json" ] > }, > "columns" : [ "`channelid`", "`serverTime`" ], > "selectionRoot" : > "hdfs://namenode:9000/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz", > "cost" : 1800981.0 > }, { > "pop" : "hash-aggregate", > "@id" : 196610, > "child" : 196611, > "cardinality" : 1.0, > "initialAllocation" : 100, > "maxAllocation" : 100, > "groupByExprs" : [ { > "ref" : "`channelid`", > "expr" : "`channelid`" > } ], > "aggrExprs" : [ { > "ref" : "`EXPR$1`", > "expr" : "count(`serverTime`) " > } ], > "cost" : 900490.5 > }, { > "pop" : "project", > "@id" : 196609, > "exprs" : [ { > "ref" : "`channelid`", > "expr" : "`channelid`" > }, { > "ref" : "`EXPR$1`", > "expr" : "`EXPR$1`" > }, { > "ref" : "`E_X_P_R_H_A_S_H_F_I_E_L_D`", > "expr" : "hash32asdouble(`channelid`) " > } ], > "child" : 196610, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 180098.1 > }, { > "pop" : "unordered-mux-exchange", > "@id" : 131073, > "child" : 196609, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 180098.1 > }, { > "pop" : "hash-to-random-exchange", > "@id" : 65539, > "child" : 131073, > "expr" : "`E_X_P_R_H_A_S_H_F_I_E_L_D`", > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 180098.1 > }, { > "pop" : "project", > "@id" : 65538, > "exprs" : [ { > "ref" : "`channelid`", > "expr" : "`channelid`" > }, { > "ref" : "`EXPR$1`", > "expr" : "`EXPR$1`" > } ], > "child" : 65539, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 180098.1 > }, { > "pop" : "hash-aggregate", >
[jira] [Created] (DRILL-4829) Configure the address to bind to
Daniel Stockton created DRILL-4829: -- Summary: Configure the address to bind to Key: DRILL-4829 URL: https://issues.apache.org/jira/browse/DRILL-4829 Project: Apache Drill Issue Type: Improvement Reporter: Daniel Stockton Priority: Minor 1.7 included the following patch to prevent Drillbits binding to the loopback address: https://issues.apache.org/jira/browse/DRILL-4523 "Drillbit is disallowed to bind to loopback address in distributed mode." It would be better if this was configurable rather than rely on /etc/hosts, since it's common for the hostname to resolve to loopback. Would you accept a patch that adds this option to drill.override.conf? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/559#discussion_r73649603 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java --- @@ -402,4 +402,18 @@ public String getCacheFileRoot() { return cacheFileRoot; } + @Override + public String toString() { +final StringBuilder sb = new StringBuilder(); +sb.append("root=" + this.selectionRoot); + +sb.append("files=["); +for (final String file : this.files) { + sb.append(file); --- End diff -- Is this method only used internally or would it show up in the Explain output ? If latter, you may add a comma or space after each file name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/559#discussion_r73649304 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java --- @@ -386,4 +386,49 @@ public void testPartitionFilterWithInSubquery() throws Exception { test("alter session set `planner.in_subquery_threshold` = 10"); testExcludeFilter(query, 4, "Filter", 40); } + + + @Test // DRILL-4825: querying same table with different filter in UNION ALL. + public void testPruneSameTableInUnionAll() throws Exception { +final String query = String.format("select count(*) as cnt from " ++ "( select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') union all " ++ " select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996') )", +TEST_RES_PATH, TEST_RES_PATH); + +String [] exclued = {"Filter"}; + +// verify plan that filter is applied in partition pruning. +testPlanMatchingPatterns(query, null, exclued); + +// verify we get correct count(*). +testBuilder() +.sqlQuery(query) +.unOrdered() +.baselineColumns("cnt") +.baselineValues((long)120) +.build() +.run(); + } + + @Test // DRILL-4825: querying same table with different filter in Join. + public void testPruneSameTableInJoin() throws Exception { +final String query = String.format("select * from " ++ "( select sum(o_custkey) as x from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') ) join " ++ " ( select sum(o_custkey) as y from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996')) " ++ " on x = y ", +TEST_RES_PATH, TEST_RES_PATH); + +String [] exclued = {"Filter"}; --- End diff -- same as above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/559#discussion_r73649293 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java --- @@ -386,4 +386,49 @@ public void testPartitionFilterWithInSubquery() throws Exception { test("alter session set `planner.in_subquery_threshold` = 10"); testExcludeFilter(query, 4, "Filter", 40); } + + + @Test // DRILL-4825: querying same table with different filter in UNION ALL. + public void testPruneSameTableInUnionAll() throws Exception { +final String query = String.format("select count(*) as cnt from " ++ "( select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') union all " ++ " select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996') )", +TEST_RES_PATH, TEST_RES_PATH); + +String [] exclued = {"Filter"}; --- End diff -- 'excluded' (missing 'd') --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/559#discussion_r73649124 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DirPrunedEnumerableTableScan.java --- @@ -0,0 +1,73 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.planner.logical; + +import com.google.common.base.Supplier; +import com.google.common.collect.ImmutableList; +import org.apache.calcite.adapter.enumerable.EnumerableConvention; +import org.apache.calcite.adapter.enumerable.EnumerableTableScan; +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelOptTable; +import org.apache.calcite.plan.RelTraitSet; +import org.apache.calcite.rel.RelCollation; +import org.apache.calcite.rel.RelCollationTraitDef; +import org.apache.calcite.rel.RelWriter; +import org.apache.calcite.schema.Table; + +import java.util.List; + +/** + * This class extends from EnumerableTableScan. It puts the file selection string into it's digest. + * When directory-based partition pruning applied, file selection could be different for the same + * table. + */ +public class DirPrunedEnumerableTableScan extends EnumerableTableScan { + private final String digestFromSelection; + + public DirPrunedEnumerableTableScan(RelOptCluster cluster, RelTraitSet traitSet, --- End diff -- Besides the constructor and the create() methods, there's also the copy() method that I think should be overridden. Several parts of the code make a copy of EnumerableTableScan and they will not see the digestFromSelection. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---