[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872157#comment-16872157
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map columnsStatistics;
>   Map statistics;
>   List partitionValues;
>   List locations;
>   long lastModifiedTime;
> }
> {noformat}
> h1. Segment metadata
> In the fix for this Jira, one of the changes is introducing segment level 
> metadata.
> For now, metadata hierarchy is the following:
> - Table
> - Segment

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871478#comment-16871478
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on issue #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#issuecomment-505076823
 
 
   @vvysotskyi thanks for making the changes. +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map columnsStatistics;
>   Map statistics;
>   List partitionValues;
>   List locations;
>   long lastModifiedTime;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871468#comment-16871468
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296794477
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871469#comment-16871469
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296776704
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
+   *
+   * @return last modified time of files
+   */
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  public List getPartitionValues() {
+return partitionValues;
+  }
+
+  public static PartitionMetadataBuilder builder() {
+return new PartitionMetadataBuilder();
+  }
+
+  public static class PartitionMetadataBuilder extends 
BaseMetadataBuilder {
+private SchemaPath column;
+private List partitionValues;
+private Set locations;
+private long lastModifiedTime = 
BaseTableMetadata.NON_DEFINED_LAST_MODIFIED_TIME;
+
+public PartitionMetadataBuilder withLocations(Set locations) {
 
 Review comment:
   Agree, renamed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871455#comment-16871455
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753640
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
 
 Review comment:
   Done, thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871456#comment-16871456
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296757799
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871459#comment-16871459
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296772407
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
+ */
+public class MetadataInfo {
+
+  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
+  public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
+  public static final String DEFAULT_COLUMN_PREFIX = "_$SEGMENT_";
 
 Review comment:
   This constant will be used for creating a segment column name to avoid 
depending on the values of session options for partition column names.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871454#comment-16871454
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296752797
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
 
 Review comment:
   Thanks, removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871446#comment-16871446
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296737969
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataUtils.java
 ##
 @@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.metadata.BaseMetadata;
+import org.apache.drill.metastore.metadata.TableMetadata;
+import org.apache.drill.metastore.statistics.CollectableColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.ColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.TableStatisticsKind;
+import 
org.apache.drill.shaded.guava.com.google.common.primitives.UnsignedBytes;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class TableMetadataUtils {
+
+  private TableMetadataUtils() {
 
 Review comment:
   Agree, removed it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871447#comment-16871447
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296746956
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
+return OBJECT_READER.readValue(serialized);
+  }
+
+  public static String serialize(StatisticsHolder statisticsHolder) throws 
JsonProcessingException {
 
 Review comment:
   Thanks, done for this class and for `ColumnStatistics`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871465#comment-16871465
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296786531
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -210,6 +213,17 @@ public int getMaxParallelizationWidth() {
 return readEntries;
   }
 
+  /**
+   * {@inheritDoc}
+   * 
+   * - if file metadata was pruned, prunes underlying metadata
 
 Review comment:
   Yes, it can. Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871457#comment-16871457
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296761257
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -221,6 +229,31 @@ public void setFilterForRuntime(LogicalExpression 
filterExpr, OptimizerRulesCont
 if ( ! skipRuntimePruning ) { setFilter(filterExpr); }
   }
 
+  /**
+   * Applies specified filter {@code filterExpr} to current group scan and 
produces filtering at:
+   * 
+   * table level:
+   * - if filter matches all the the data or prunes all the data, sets 
corresponding value to
 
 Review comment:
   Agree, thanks for pointing this, replaced it with nested lists.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871448#comment-16871448
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296746701
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871444#comment-16871444
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296737602
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -452,53 +456,54 @@ public static ObjectMapper getMapper() {
 .addDeserializer(TypeProtos.MajorType.class, new MajorTypeSerDe.De())
 .addDeserializer(SchemaPath.class, new SchemaPath.De());
 mapper.registerModule(deModule);
+mapper.registerSubtypes(new NamedType(NumericEquiDepthHistogram.class, 
"numeric-equi-depth"));
 
 Review comment:
   It would be nice, but I think I can break backward compatibility since it 
was defined earlier here: 
https://github.com/apache/drill/blob/05a1a3a888a7408bde683acc36f406fbd2459254/exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/Histogram.java#L31
 
   So all previously created stats files wouldn't be deserialized correctly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871463#comment-16871463
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296765872
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map columnsStatistics;
>   Map statistics;
>   List partitionValues;
>   List locations;
>   long lastModifiedTime;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871460#comment-16871460
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296765020
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -572,34 +626,39 @@ public 
GroupScanWithMetadataFilterer(AbstractGroupScanWithMetadata source) {
  */
 public abstract AbstractGroupScanWithMetadata build();
 
-public GroupScanWithMetadataFilterer withTable(TableMetadata 
tableMetadata) {
+public B withTable(TableMetadata tableMetadata) {
   this.tableMetadata = tableMetadata;
-  return this;
+  return self();
 
 Review comment:
   `self()` method was introduced to return a specific type of implementation 
instead of the base type. So we don't need to add casts for the case when 
`this` instance should be returned.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871464#comment-16871464
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296776542
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871442#comment-16871442
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296731422
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/StatisticsProvider.java
 ##
 @@ -218,89 +201,85 @@ public ColumnStatistics 
visitFunctionHolderExpression(FunctionHolderExpression h
   ValueHolder minFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args1, holderExpr.getName());
   ValueHolder maxFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args2, holderExpr.getName());
 
-  MinMaxStatistics statistics;
   switch (destType) {
 case INT:
-  statistics = new MinMaxStatistics<>(((IntHolder) 
minFuncHolder).value, ((IntHolder) maxFuncHolder).value, Integer::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((IntHolder) minFuncHolder).value,
+  ((IntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case BIGINT:
-  statistics = new MinMaxStatistics<>(((BigIntHolder) 
minFuncHolder).value, ((BigIntHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((BigIntHolder) minFuncHolder).value,
+  ((BigIntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT4:
-  statistics = new MinMaxStatistics<>(((Float4Holder) 
minFuncHolder).value, ((Float4Holder) maxFuncHolder).value, Float::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float4Holder) minFuncHolder).value,
+  ((Float4Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT8:
-  statistics = new MinMaxStatistics<>(((Float8Holder) 
minFuncHolder).value, ((Float8Holder) maxFuncHolder).value, Double::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float8Holder) minFuncHolder).value,
+  ((Float8Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case TIMESTAMP:
-  statistics = new MinMaxStatistics<>(((TimeStampHolder) 
minFuncHolder).value, ((TimeStampHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((TimeStampHolder) minFuncHolder).value,
+  ((TimeStampHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 default:
   return null;
   }
-  statistics.setNullsCount((long) 
input.getStatistic(ColumnStatisticsKind.NULLS_COUNT));
-  return statistics;
 } catch (Exception e) {
-  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName() );
+  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName());
 }
   }
 
-  public static class MinMaxStatistics implements ColumnStatistics {
-private final V minVal;
-private final V maxVal;
-private final Comparator valueComparator;
-private long nullsCount;
-
-public MinMaxStatistics(V minVal, V maxVal, Comparator valueComparator) 
{
-  this.minVal = minVal;
-  this.maxVal = maxVal;
-  this.valueComparator = valueComparator;
-}
-
-@Override
-public Object getStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-  return minVal;
-case ExactStatisticsConstants.MAX_VALUE:
-  return maxVal;
-case ExactStatisticsConstants.NULLS_COUNT:
-  return nullsCount;
-default:
-  return null;
-  }
-}
-
-@Override
-public boolean containsStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-case ExactStatisticsConstants.MAX_VALUE:
-case ExactStatisticsConstants.NULLS_COUNT:
-  return true;
-default:
-  return false;
-  }
-}
-
-@Override
-public boolean containsExactStatistics(StatisticsKind statisticsKind) {
-  return true;
-}
-
-@Override
-public Comparator getValueComparator() {
-  return valueComparator;
-  

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871467#comment-16871467
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296787766
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,54 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
-  logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
-"{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
-"But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
+  if (logger.isWarnEnabled()) {
 
 Review comment:
   Agree, this is very unlikely)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871470#comment-16871470
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296779015
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/TableInfo.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * General table information.
+ */
+public class TableInfo {
+  public static final String UNKNOWN = "UNKNOWN";
+  public static final TableInfo UNKNOWN_TABLE_INFO = new TableInfo(UNKNOWN, 
UNKNOWN, UNKNOWN, UNKNOWN, UNKNOWN);
+
+  private final String storagePlugin;
+  private final String workspace;
+  private final String name;
+  private final String type;
+  private final String owner;
+
+  public TableInfo(String storagePlugin, String workspace, String name, String 
type, String owner) {
 
 Review comment:
   Thanks, done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871451#comment-16871451
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753052
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871445#comment-16871445
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296750708
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
 
 Review comment:
   Agree, `metadataStatistics` fits better, renamed it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871462#comment-16871462
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296762428
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -535,31 +581,39 @@ public TableMetadata getTableMetadata() {
 return partitions;
   }
 
+  protected Map getSegmentsMetadata() {
+if (segments == null) {
+  segments = metadataProvider.getSegmentsMetadataMap();
+}
+return segments;
+  }
+
   @JsonIgnore
   public NonInterestingColumnsMetadata getNonInterestingColumnsMetadata() {
 if (nonInterestingColumnsMetadata == null) {
-  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMeta();
+  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMetadata();
 }
 return nonInterestingColumnsMetadata;
   }
 
   /**
* This class is responsible for filtering different metadata levels.
*/
-  protected abstract static class GroupScanWithMetadataFilterer {
+  protected abstract static class GroupScanWithMetadataFilterer> {
 protected final AbstractGroupScanWithMetadata source;
 
 protected boolean matchAllMetadata = false;
 
 protected TableMetadata tableMetadata;
 protected List partitions = Collections.emptyList();
+protected Map segments = Collections.emptyMap();
 
 Review comment:
   Yes, it is expected. Later it may be replaced with a regular list or if 
filtering will not happen, there wouldn't be allocated new object.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871461#comment-16871461
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296770734
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871452#comment-16871452
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296767257
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set schemaPathsInExpr) {
+protected void filterSegmentMetadata(OptionManager optionManager,
+ FilterPredicate filterPredicate,
+ Set schemaPathsInExpr) {
   if (!matchAllMetadata) {
-if (!source.getPartitionsMetadata().isEmpty()) {
-  if (source.getPartitionsMetadata().size() <= optionManager.getOption(
-
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
+if (!source.getSegmentsMetadata().isEmpty()) {
+  if (source.getSegmentsMetadata().size() <= optionManager.getOption(
+  
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
 matchAllMetadata = true;
-partitions = filterAndGetMetadata(schemaPathsInExpr, 
source.getPartitionsMetadata(), filterPredicate, optionManager);
+segments = filterAndGetMetadata(schemaPathsInExpr,
+source.getSegmentsMetadata().values(),
+filterPredicate,
+optionManager).stream()
+.collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
 
 Review comment:
   Thanks, formatted the code and added `BinaryOperator`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871472#comment-16871472
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296781802
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/ColumnStatistics.java
 ##
 @@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonAutoDetect;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonPropertyOrder;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.TableMetadataUtils;
+
+import java.io.IOException;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static 
org.apache.drill.metastore.statistics.StatisticsHolder.OBJECT_WRITER;
+
+/**
+ * Represents collection of statistics values for specific column.
 
 Review comment:
   Thanks, added.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871443#comment-16871443
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296728354
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java
 ##
 @@ -228,5 +228,10 @@ public 
HiveDrillNativeParquetScanFilterer(HiveDrillNativeParquetScan source) {
 protected AbstractParquetGroupScan getNewScan() {
   return new HiveDrillNativeParquetScan((HiveDrillNativeParquetScan) 
source);
 }
+
+@Override
+protected HiveDrillNativeParquetScanFilterer self() {
 
 Review comment:
   This method came from `GroupScanWithMetadataFilterer` and is used to return 
the correct type of `this` instance to avoid casts in parent classes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871471#comment-16871471
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296794257
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -313,18 +328,30 @@ public TableMetadata getTableMetadata() {
   partitionsForValue.asMap().forEach((partitionKey, value) -> {
 Map columnsStatistics = new 
HashMap<>();
 
-Map statistics = new HashMap<>();
+List statistics = new ArrayList<>();
 partitionKey = partitionKey == NULL_VALUE ? null : partitionKey;
-statistics.put(ColumnStatisticsKind.MIN_VALUE, partitionKey);
-statistics.put(ColumnStatisticsKind.MAX_VALUE, partitionKey);
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MIN_VALUE));
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MAX_VALUE));
 
-statistics.put(ColumnStatisticsKind.NULLS_COUNT, 
Statistic.NO_COLUMN_STATS);
-statistics.put(TableStatisticsKind.ROW_COUNT, 
Statistic.NO_COLUMN_STATS);
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
ColumnStatisticsKind.NULLS_COUNT));
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
TableStatisticsKind.ROW_COUNT));
 columnsStatistics.put(partitionColumn,
-new ColumnStatisticsImpl<>(statistics,
-
ParquetTableMetadataUtils.getComparator(getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType(;
-partitions.add(new PartitionMetadata(partitionColumn, 
getTableMetadata().getSchema(),
-columnsStatistics, statistics, (Set) value, tableName, 
-1));
+new ColumnStatistics<>(statistics,
+
getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType()));
+MetadataInfo metadataInfo = new 
MetadataInfo(MetadataType.PARTITION, MetadataInfo.GENERAL_INFO_KEY, null);
+TableMetadata tableMetadata = getTableMetadata();
+PartitionMetadata partitionMetadata = PartitionMetadata.builder()
+.withTableInfo(tableMetadata.getTableInfo())
+.withMetadataInfo(metadataInfo)
+.withColumn(partitionColumn)
+.withSchema(tableMetadata.getSchema())
+.withColumnsStatistics(columnsStatistics)
+.withStatistics(statistics)
+.withPartitionValues(Collections.emptyList())
+.withLocations((Set) value)
 
 Review comment:
   It is required because `HashMultimap.asMap()` returns map with Collection in 
the values, but for `HashMultimap` used set. To avoid problems for the case 
when `HashMultimap` implementation is changed, I have replaced it with `new 
HashSet<>(value)`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871458#comment-16871458
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296756147
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871466#comment-16871466
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296775175
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
+
+  /**
+   * Table level metadata type.
+   */
+  TABLE,
+
+  /**
+   * Segment level metadata type. It corresponds to the metadata
+   * within specific directory for FS tables, or may correspond to partition 
for hive tables.
+   */
+  SEGMENT,
+
+  /**
+   * Drill partition level metadata type. It corresponds to parts of table 
data which has the same
+   * values within specific column, i.e. partitions discovered by Drill.
+   */
+  PARTITION,
+
+  /**
+   * File level metadata type.
+   */
+  FILE,
+
+  /**
+   * Row group level metadata type. Used for parquet tables.
+   */
+  ROW_GROUP,
+
+  NONE
 
 Review comment:
   1. Thanks, added.
   2. It is used during filtering to indicate that filtering was finished and 
there was no metadata whose size exceeds 
`PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871449#comment-16871449
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296743285
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
 
 Review comment:
   It is also used in `ColumnStatistics`. Set package default visibility.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871453#comment-16871453
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753431
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871450#comment-16871450
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753490
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871133#comment-16871133
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296687506
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,54 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
-  logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
-"{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
-"But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
+  if (logger.isWarnEnabled()) {
 
 Review comment:
   No objections for this change but what are the odds of warn level being 
disabled? :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871147#comment-16871147
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296693580
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871134#comment-16871134
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296686176
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -210,6 +213,17 @@ public int getMaxParallelizationWidth() {
 return readEntries;
   }
 
+  /**
+   * {@inheritDoc}
+   * 
+   * - if file metadata was pruned, prunes underlying metadata
 
 Review comment:
   Not sure if we need dash here, can be this covered with nested list?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871137#comment-16871137
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296684303
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set schemaPathsInExpr) {
+protected void filterSegmentMetadata(OptionManager optionManager,
+ FilterPredicate filterPredicate,
+ Set schemaPathsInExpr) {
   if (!matchAllMetadata) {
-if (!source.getPartitionsMetadata().isEmpty()) {
-  if (source.getPartitionsMetadata().size() <= optionManager.getOption(
-
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
+if (!source.getSegmentsMetadata().isEmpty()) {
+  if (source.getSegmentsMetadata().size() <= optionManager.getOption(
+  
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
 matchAllMetadata = true;
-partitions = filterAndGetMetadata(schemaPathsInExpr, 
source.getPartitionsMetadata(), filterPredicate, optionManager);
+segments = filterAndGetMetadata(schemaPathsInExpr,
+source.getSegmentsMetadata().values(),
+filterPredicate,
+optionManager).stream()
+.collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
 
 Review comment:
   ```suggestion
   .collect(Collectors.toMap(
   SegmentMetadata::getPath,
   Function.identity()));
   ```
   Plus what about duplicates handling? It would be safer to add `(o, n) -> n` 
but of course if you did not intend to fail on duplicate.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871142#comment-16871142
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296690973
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
 
 Review comment:
   Please write better java doc: "Class that specifies metadata type ..." and 
provide an example.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871152#comment-16871152
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691929
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
 
 Review comment:
   Add timestamp unit of measurement.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871150#comment-16871150
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296694014
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871156#comment-16871156
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296696051
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/ColumnStatistics.java
 ##
 @@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonAutoDetect;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonPropertyOrder;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.TableMetadataUtils;
+
+import java.io.IOException;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static 
org.apache.drill.metastore.statistics.StatisticsHolder.OBJECT_WRITER;
+
+/**
+ * Represents collection of statistics values for specific column.
 
 Review comment:
   Can you please add example.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871132#comment-16871132
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296685467
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -452,53 +456,54 @@ public static ObjectMapper getMapper() {
 .addDeserializer(TypeProtos.MajorType.class, new MajorTypeSerDe.De())
 .addDeserializer(SchemaPath.class, new SchemaPath.De());
 mapper.registerModule(deModule);
+mapper.registerSubtypes(new NamedType(NumericEquiDepthHistogram.class, 
"numeric-equi-depth"));
 
 Review comment:
   Do you think it makes sense to add `histogram` word as well: 
`numeric-equi-depth-histogram`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871131#comment-16871131
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296682838
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map columnsStatistics;
>   Map statistics;
>   List partitionValues;
>   List locations;
>   long lastModifiedTime;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871141#comment-16871141
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691350
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
+
+  /**
+   * Table level metadata type.
+   */
+  TABLE,
+
+  /**
+   * Segment level metadata type. It corresponds to the metadata
+   * within specific directory for FS tables, or may correspond to partition 
for hive tables.
+   */
+  SEGMENT,
+
+  /**
+   * Drill partition level metadata type. It corresponds to parts of table 
data which has the same
+   * values within specific column, i.e. partitions discovered by Drill.
+   */
+  PARTITION,
+
+  /**
+   * File level metadata type.
+   */
+  FILE,
+
+  /**
+   * Row group level metadata type. Used for parquet tables.
+   */
+  ROW_GROUP,
+
+  NONE
 
 Review comment:
   1. Add java doc
   2. Where none can be used?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871136#comment-16871136
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296682028
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -535,31 +581,39 @@ public TableMetadata getTableMetadata() {
 return partitions;
   }
 
+  protected Map getSegmentsMetadata() {
+if (segments == null) {
+  segments = metadataProvider.getSegmentsMetadataMap();
+}
+return segments;
+  }
+
   @JsonIgnore
   public NonInterestingColumnsMetadata getNonInterestingColumnsMetadata() {
 if (nonInterestingColumnsMetadata == null) {
-  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMeta();
+  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMetadata();
 }
 return nonInterestingColumnsMetadata;
   }
 
   /**
* This class is responsible for filtering different metadata levels.
*/
-  protected abstract static class GroupScanWithMetadataFilterer {
+  protected abstract static class GroupScanWithMetadataFilterer> {
 protected final AbstractGroupScanWithMetadata source;
 
 protected boolean matchAllMetadata = false;
 
 protected TableMetadata tableMetadata;
 protected List partitions = Collections.emptyList();
+protected Map segments = Collections.emptyMap();
 
 Review comment:
   Using Collections emptyMap or emptyList creates unmodifiable objects, is 
this expected?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871125#comment-16871125
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296615995
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
+return OBJECT_READER.readValue(serialized);
+  }
+
+  public static String serialize(StatisticsHolder statisticsHolder) throws 
JsonProcessingException {
 
 Review comment:
   Should be class level method without parameters: `public String 
toJsonString()`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>  

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871140#comment-16871140
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296692394
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
+   *
+   * @return last modified time of files
+   */
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  public List getPartitionValues() {
+return partitionValues;
+  }
+
+  public static PartitionMetadataBuilder builder() {
+return new PartitionMetadataBuilder();
+  }
+
+  public static class PartitionMetadataBuilder extends 
BaseMetadataBuilder {
+private SchemaPath column;
+private List partitionValues;
+private Set locations;
+private long lastModifiedTime = 
BaseTableMetadata.NON_DEFINED_LAST_MODIFIED_TIME;
+
+public PartitionMetadataBuilder withLocations(Set locations) {
 
 Review comment:
   I think you can omit adding with, example: `locations`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871135#comment-16871135
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691160
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
 
 Review comment:
   java doc: "Metadata that can be applicable to any type"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871126#comment-16871126
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296615585
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
 
 Review comment:
   Rename: `deserialize` -> `of`, `serialized` -> `jsonString`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871130#comment-16871130
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296682165
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -572,34 +626,39 @@ public 
GroupScanWithMetadataFilterer(AbstractGroupScanWithMetadata source) {
  */
 public abstract AbstractGroupScanWithMetadata build();
 
-public GroupScanWithMetadataFilterer withTable(TableMetadata 
tableMetadata) {
+public B withTable(TableMetadata tableMetadata) {
   this.tableMetadata = tableMetadata;
-  return this;
+  return self();
 
 Review comment:
   Why `self()` method is better than returning `this`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871153#comment-16871153
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296694427
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
 
 Review comment:
   What the difference between statistics and column statistics? Maybe 
statistics should be named better, for example, generalStatistics or 
metadataStatistics?
   I think for Metastore we used `metadataStatistics` naming ...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871145#comment-16871145
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296696592
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
+return OBJECT_READER.readValue(serialized);
+  }
+
+  public static String serialize(StatisticsHolder statisticsHolder) throws 
JsonProcessingException {
 
 Review comment:
   Please apply the same for other classes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871148#comment-16871148
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691639
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
 
 Review comment:
   ```suggestion
  * Allows to check the time, when any files were modified. It is in Unix 
Timestamp
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871144#comment-16871144
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296689298
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataUtils.java
 ##
 @@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.metadata.BaseMetadata;
+import org.apache.drill.metastore.metadata.TableMetadata;
+import org.apache.drill.metastore.statistics.CollectableColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.ColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.TableStatisticsKind;
+import 
org.apache.drill.shaded.guava.com.google.common.primitives.UnsignedBytes;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class TableMetadataUtils {
+
+  private TableMetadataUtils() {
 
 Review comment:
   Again, no objections but just per my opinion this is an overhead.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871143#comment-16871143
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296693933
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871128#comment-16871128
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296680787
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -221,6 +229,31 @@ public void setFilterForRuntime(LogicalExpression 
filterExpr, OptimizerRulesCont
 if ( ! skipRuntimePruning ) { setFilter(filterExpr); }
   }
 
+  /**
+   * Applies specified filter {@code filterExpr} to current group scan and 
produces filtering at:
+   * 
+   * table level:
+   * - if filter matches all the the data or prunes all the data, sets 
corresponding value to
 
 Review comment:
   I believe html formatting has notion of nested lists rather than doing 
custom paragraph with dash.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871146#comment-16871146
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296694074
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871127#comment-16871127
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296621362
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java
 ##
 @@ -228,5 +228,10 @@ public 
HiveDrillNativeParquetScanFilterer(HiveDrillNativeParquetScan source) {
 protected AbstractParquetGroupScan getNewScan() {
   return new HiveDrillNativeParquetScan((HiveDrillNativeParquetScan) 
source);
 }
+
+@Override
+protected HiveDrillNativeParquetScanFilterer self() {
 
 Review comment:
   Can you please explain where this method came from?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871149#comment-16871149
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695307
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871155#comment-16871155
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695012
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
 
 Review comment:
   Do you think `with` can be removed?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871154#comment-16871154
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695918
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/TableInfo.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * General table information.
+ */
+public class TableInfo {
+  public static final String UNKNOWN = "UNKNOWN";
+  public static final TableInfo UNKNOWN_TABLE_INFO = new TableInfo(UNKNOWN, 
UNKNOWN, UNKNOWN, UNKNOWN, UNKNOWN);
+
+  private final String storagePlugin;
+  private final String workspace;
+  private final String name;
+  private final String type;
+  private final String owner;
+
+  public TableInfo(String storagePlugin, String workspace, String name, String 
type, String owner) {
 
 Review comment:
   Make constructor private and add builder.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871139#comment-16871139
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296690696
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
+ */
+public class MetadataInfo {
+
+  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
+  public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
+  public static final String DEFAULT_COLUMN_PREFIX = "_$SEGMENT_";
 
 Review comment:
   Where this constant will be used?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871129#comment-16871129
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296622444
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/StatisticsProvider.java
 ##
 @@ -218,89 +201,85 @@ public ColumnStatistics 
visitFunctionHolderExpression(FunctionHolderExpression h
   ValueHolder minFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args1, holderExpr.getName());
   ValueHolder maxFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args2, holderExpr.getName());
 
-  MinMaxStatistics statistics;
   switch (destType) {
 case INT:
-  statistics = new MinMaxStatistics<>(((IntHolder) 
minFuncHolder).value, ((IntHolder) maxFuncHolder).value, Integer::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((IntHolder) minFuncHolder).value,
+  ((IntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case BIGINT:
-  statistics = new MinMaxStatistics<>(((BigIntHolder) 
minFuncHolder).value, ((BigIntHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((BigIntHolder) minFuncHolder).value,
+  ((BigIntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT4:
-  statistics = new MinMaxStatistics<>(((Float4Holder) 
minFuncHolder).value, ((Float4Holder) maxFuncHolder).value, Float::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float4Holder) minFuncHolder).value,
+  ((Float4Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT8:
-  statistics = new MinMaxStatistics<>(((Float8Holder) 
minFuncHolder).value, ((Float8Holder) maxFuncHolder).value, Double::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float8Holder) minFuncHolder).value,
+  ((Float8Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case TIMESTAMP:
-  statistics = new MinMaxStatistics<>(((TimeStampHolder) 
minFuncHolder).value, ((TimeStampHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((TimeStampHolder) minFuncHolder).value,
+  ((TimeStampHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 default:
   return null;
   }
-  statistics.setNullsCount((long) 
input.getStatistic(ColumnStatisticsKind.NULLS_COUNT));
-  return statistics;
 } catch (Exception e) {
-  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName() );
+  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName());
 }
   }
 
-  public static class MinMaxStatistics implements ColumnStatistics {
-private final V minVal;
-private final V maxVal;
-private final Comparator valueComparator;
-private long nullsCount;
-
-public MinMaxStatistics(V minVal, V maxVal, Comparator valueComparator) 
{
-  this.minVal = minVal;
-  this.maxVal = maxVal;
-  this.valueComparator = valueComparator;
-}
-
-@Override
-public Object getStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-  return minVal;
-case ExactStatisticsConstants.MAX_VALUE:
-  return maxVal;
-case ExactStatisticsConstants.NULLS_COUNT:
-  return nullsCount;
-default:
-  return null;
-  }
-}
-
-@Override
-public boolean containsStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-case ExactStatisticsConstants.MAX_VALUE:
-case ExactStatisticsConstants.NULLS_COUNT:
-  return true;
-default:
-  return false;
-  }
-}
-
-@Override
-public boolean containsExactStatistics(StatisticsKind statisticsKind) {
-  return true;
-}
-
-@Override
-public Comparator getValueComparator() {
-  return 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871124#comment-16871124
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296614730
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
 
 Review comment:
   private?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871151#comment-16871151
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695117
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
 
 Review comment:
   Same here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871138#comment-16871138
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296688086
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -313,18 +328,30 @@ public TableMetadata getTableMetadata() {
   partitionsForValue.asMap().forEach((partitionKey, value) -> {
 Map columnsStatistics = new 
HashMap<>();
 
-Map statistics = new HashMap<>();
+List statistics = new ArrayList<>();
 partitionKey = partitionKey == NULL_VALUE ? null : partitionKey;
-statistics.put(ColumnStatisticsKind.MIN_VALUE, partitionKey);
-statistics.put(ColumnStatisticsKind.MAX_VALUE, partitionKey);
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MIN_VALUE));
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MAX_VALUE));
 
-statistics.put(ColumnStatisticsKind.NULLS_COUNT, 
Statistic.NO_COLUMN_STATS);
-statistics.put(TableStatisticsKind.ROW_COUNT, 
Statistic.NO_COLUMN_STATS);
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
ColumnStatisticsKind.NULLS_COUNT));
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
TableStatisticsKind.ROW_COUNT));
 columnsStatistics.put(partitionColumn,
-new ColumnStatisticsImpl<>(statistics,
-
ParquetTableMetadataUtils.getComparator(getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType(;
-partitions.add(new PartitionMetadata(partitionColumn, 
getTableMetadata().getSchema(),
-columnsStatistics, statistics, (Set) value, tableName, 
-1));
+new ColumnStatistics<>(statistics,
+
getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType()));
+MetadataInfo metadataInfo = new 
MetadataInfo(MetadataType.PARTITION, MetadataInfo.GENERAL_INFO_KEY, null);
+TableMetadata tableMetadata = getTableMetadata();
+PartitionMetadata partitionMetadata = PartitionMetadata.builder()
+.withTableInfo(tableMetadata.getTableInfo())
+.withMetadataInfo(metadataInfo)
+.withColumn(partitionColumn)
+.withSchema(tableMetadata.getSchema())
+.withColumnsStatistics(columnsStatistics)
+.withStatistics(statistics)
+.withPartitionValues(Collections.emptyList())
+.withLocations((Set) value)
 
 Review comment:
   Why cast is needed here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870955#comment-16870955
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296612824
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,50 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
   logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
 "{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
 "But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+  ExpressionStringBuilder.toString(filterExpr), 
filteredMetadata.getOverflowLevel());
 }
 
 logger.debug("applyFilter {} reduce row groups # from {} to {}",
-ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), builder.getRowGroups().size());
+ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), filteredMetadata.getRowGroups().size());
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870949#comment-16870949
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296327891
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,50 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
   logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
 "{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
 "But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+  ExpressionStringBuilder.toString(filterExpr), 
filteredMetadata.getOverflowLevel());
 }
 
 logger.debug("applyFilter {} reduce row groups # from {} to {}",
-ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), builder.getRowGroups().size());
+ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), filteredMetadata.getRowGroups().size());
 
 Review comment:
   add ```isDebugEnabled()``` check before call 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869603#comment-16869603
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296246726
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataProvider.java
 ##
 @@ -71,6 +73,27 @@
*/
   List getFilesMetadata();
 
 Review comment:
   Agree, removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map columnsStatistics;
>   Map statistics;
>   List partitionValues;
>   List locations;
>   long 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869601#comment-16869601
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296243362
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -461,14 +465,14 @@ public static ObjectMapper getMapper() {
* @param statsProvider the source of statistics
* @return map of {@link StatisticsKind} and statistics values
*/
-  public static Map 
getEstimatedTableStats(DrillStatsTable statsProvider) {
+  public static List getEstimatedTableStats(DrillStatsTable 
statsProvider) {
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869612#comment-16869612
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296251507
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -342,17 +350,13 @@ public TableMetadata getTableMetadata() {
 
   @Override
   public FileMetadata getFileMetadata(Path location) {
-return getFilesMetadata().stream()
-.filter(Objects::nonNull)
-.filter(fileMetadata -> location.equals(fileMetadata.getLocation()))
-.findAny()
-.orElse(null);
+return getFilesMetadataMap().get(location);
   }
 
   @Override
   public List getFilesForPartition(PartitionMetadata partition) {
-return getFilesMetadata().stream()
-.filter(file -> partition.getLocations().contains(file.getLocation()))
+return partition.getLocations().stream()
+.map(location -> getFilesMetadataMap().get(location))
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869605#comment-16869605
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296244575
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -247,6 +250,13 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 matchAllMetadata = builder.isMatchAllMetadata();
 return null;
   }
+} else if (!getSegmentsMetadata().isEmpty()) {
+  if (!builder.getSegments().isEmpty() && getSegmentsMetadata().size() == 
builder.getSegments().size()) {
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869606#comment-16869606
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296248847
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataUtils.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.metadata.BaseMetadata;
+import org.apache.drill.metastore.metadata.TableMetadata;
+import org.apache.drill.metastore.statistics.CollectableColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.ColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.TableStatisticsKind;
+import 
org.apache.drill.shaded.guava.com.google.common.primitives.UnsignedBytes;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class TableMetadataUtils {
+
+  private TableMetadataUtils() {
+throw new IllegalStateException("Utility class");
+  }
+
+  /**
+   * Returns {@link Comparator} instance considering specified {@code type}.
+   *
+   * @param type type of the column
+   * @return {@link Comparator} instance
+   */
+  public static Comparator getComparator(TypeProtos.MinorType type) {
+switch (type) {
+  case INTERVALDAY:
+  case INTERVAL:
+  case INTERVALYEAR:
+return 
Comparator.nullsFirst(UnsignedBytes.lexicographicalComparator());
+  case UINT1:
+return Comparator.nullsFirst(UnsignedBytes::compare);
+  case UINT2:
+  case UINT4:
+return Comparator.nullsFirst(Integer::compareUnsigned);
+  case UINT8:
+return Comparator.nullsFirst(Long::compareUnsigned);
+  default:
+return getNaturalNullsFirstComparator();
+}
+  }
+
+  /**
+   * Returns "natural order" comparator which threads nulls as min values.
+   *
+   * @param  type to compare
+   * @return "natural order" comparator
+   */
+  public static > Comparator 
getNaturalNullsFirstComparator() {
+return Comparator.nullsFirst(Comparator.naturalOrder());
+  }
+
+  /**
+   * Merges list of specified metadata into the map of {@link 
ColumnStatistics} with columns as keys.
+   *
+   * @param  type of metadata to collect
+   * @param metadataListlist of metadata to be merged
+   * @param columns set of columns whose statistics should be 
merged
+   * @param statisticsToCollect kinds of statistics that should be collected
+   * @return list of merged metadata
+   */
+  public static  Map 
mergeColumnsStatistics(
+Collection metadataList, Set columns, 
List statisticsToCollect) {
+Map columnsStatistics = new HashMap<>();
+
+for (SchemaPath column : columns) {
+  List statisticsList = new ArrayList<>();
+  for (T metadata : metadataList) {
+ColumnStatistics statistics = 
metadata.getColumnsStatistics().get(column);
+if (statistics == null) {
+  // schema change happened, set statistics which represents all nulls
+  statistics = new ColumnStatistics(
+  Collections.singletonList(
+  new 
StatisticsHolder<>(TableStatisticsKind.ROW_COUNT.getValue(metadata), 
ColumnStatisticsKind.NULLS_COUNT)));
+}
+statisticsList.add(statistics);
+  }
+  List statisticsHolders = new ArrayList<>();
+  for (CollectableColumnStatisticsKind statisticsKind : 
statisticsToCollect) {
+Object mergedStatistic = 
statisticsKind.mergeStatistics(statisticsList);
+ 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869599#comment-16869599
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296243874
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -461,14 +465,14 @@ public static ObjectMapper getMapper() {
* @param statsProvider the source of statistics
* @return map of {@link StatisticsKind} and statistics values
*/
-  public static Map 
getEstimatedTableStats(DrillStatsTable statsProvider) {
+  public static List getEstimatedTableStats(DrillStatsTable 
statsProvider) {
 if (statsProvider != null && statsProvider.isMaterialized()) {
-  Map tableStatistics = new HashMap<>();
-  tableStatistics.put(TableStatisticsKind.EST_ROW_COUNT, 
statsProvider.getRowCount());
-  tableStatistics.put(TableStatisticsKind.HAS_STATISTICS, Boolean.TRUE);
+  List tableStatistics = new ArrayList<>();
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869609#comment-16869609
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296244854
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -414,6 +431,14 @@ public void modifyFileSelection(FileSelection selection) {
 }
 partitions = newPartitions;
 
+Map newSegments = new HashMap<>();
+if (!getSegmentsMetadata().isEmpty()) {
+  this.segments = getSegmentsMetadata().entrySet().stream()
+  .filter(entry -> fileSet.contains(entry.getKey()))
+  .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
+}
+segments = newSegments;
 
 Review comment:
   Thanks for pointing this, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869608#comment-16869608
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296253667
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -361,6 +365,102 @@ public FileMetadata getFileMetadata(Path location) {
 return new ArrayList<>(getFilesMetadataMap().values());
   }
 
+  @Override
+  public List getSegmentsMetadata() {
+return new ArrayList<>(getSegmentsMetadataMap().values());
+  }
+
+  @Override
+  public Map getSegmentsMetadataMap() {
+if (segments == null) {
+  if (entries.isEmpty() || !collectMetadata) {
+return Collections.emptyMap();
+  }
+
+  segments = new LinkedHashMap<>();
+
+  Path fileLocation = getFilesMetadata().iterator().next().getPath();
+  int levelsCount = fileLocation.depth() - tableLocation.depth();
+
+  Map filesMetadata = getFilesMetadataMap();
+  int segmentsIndex = levelsCount - 1;
+  Map segmentMetadata = 
getSegmentsForMetadata(filesMetadata,
+  SchemaPath.getSimplePath(MetadataInfo.DEFAULT_COLUMN_PREFIX + 
segmentsIndex));
+  segments.putAll(segmentMetadata);
+  for (int i = segmentsIndex - 1; i >= 0; i--) {
+String segmentColumn = MetadataInfo.DEFAULT_COLUMN_PREFIX + i;
+segmentMetadata = getMetadataForSegments(segmentMetadata,
+SchemaPath.getSimplePath(segmentColumn));
+segments.putAll(segmentMetadata);
+  }
+
+}
+return segments;
+  }
+
+  private static  Map getSegmentsForMetadata(
+  Map metadata, SchemaPath column) {
+Multimap metadataMultimap = LinkedListMultimap.create();
+metadata.forEach((key, value) -> metadataMultimap.put(key.getParent(), 
value));
+
+Map result = new HashMap<>();
+metadataMultimap.asMap().forEach((key, value) -> result.put(key, 
combineToSegmentMetadata(value, column)));
+
+return result;
+  }
+
+  private static Map getMetadataForSegments(Map metadata, SchemaPath column) {
+Multimap metadataMultimap = 
LinkedListMultimap.create();
+metadata.forEach((key, value) -> metadataMultimap.put(key.getParent(), 
value));
+
+Map result = new HashMap<>();
+metadataMultimap.asMap().forEach((key, value) -> result.put(key, 
combineSegmentMetadata(value, column)));
+
+return result;
+  }
+
+  private static  SegmentMetadata 
combineToSegmentMetadata(Collection metadataList, SchemaPath column) {
+List metadataLocations = metadataList.stream()
+.map(metadata -> metadata.getPath()) // used lambda instead of method 
reference due to JDK-8141508
+.collect(Collectors.toList());
+return combineToSegmentMetadata(metadataList, column, metadataLocations);
+  }
+
+  private static SegmentMetadata 
combineSegmentMetadata(Collection metadataList, SchemaPath 
column) {
+List metadataLocations = metadataList.stream()
+.flatMap(metadata -> metadata.getLocations().stream())
+.collect(Collectors.toList());
+
+return combineToSegmentMetadata(metadataList, column, metadataLocations);
+  }
+
+  /**
+   * Returns {@link SegmentMetadata} which is combined metadata of list of 
specified metadata
+   *
+   * @param metadataList  metadata to combine
+   * @param columnsegment column
+   * @param metadataLocations locations of metadata combined in resulting 
segment
+   * @paramtype of metadata to combine
+   * @return {@link SegmentMetadata} from combined metadata
+   */
+  private static  SegmentMetadata 
combineToSegmentMetadata(Collection metadataList,
+  SchemaPath column, List metadataLocations) {
+List segmentStatistics =
+Collections.singletonList(
+new StatisticsHolder<>(
+TableStatisticsKind.ROW_COUNT.mergeStatistics(metadataList),
+TableStatisticsKind.ROW_COUNT));
+// this code is used only to collect segment metadata to be used only 
during filtering,
+// so metadata identifier is not required here and in other places in this 
class
+MetadataInfo metadataInfo = new MetadataInfo(MetadataType.SEGMENT, 
MetadataInfo.GENERAL_INFO_KEY, null);
+T firstMetadata = metadataList.iterator().next();
+
+return new SegmentMetadata(firstMetadata.getTableInfo(), metadataInfo, 
column, firstMetadata.getSchema(),
+metadataList.iterator().next().getPath().getParent(),
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869607#comment-16869607
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296249262
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/Metadata.java
 ##
 @@ -86,4 +88,7 @@
* @return {@link ColumnMetadata} schema description of the column
*/
   ColumnMetadata getColumn(SchemaPath name);
+
+  TableInfo getTableInfo();
+  MetadataInfo getMetadataInfo();
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869604#comment-16869604
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296246995
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataProvider.java
 ##
 @@ -71,6 +73,27 @@
*/
   List getFilesMetadata();
 
+  /**
+   * Returns list of {@link SegmentMetadata} instances which provides metadata 
for specific segment and its columns.
+   *
+   * @return list of {@link SegmentMetadata} instances
+   */
+  List getSegmentsMetadata();
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869610#comment-16869610
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296288223
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetTableMetadataUtils.java
 ##
 @@ -148,112 +142,71 @@ private ParquetTableMetadataUtils() {
   public static RowGroupMetadata 
getRowGroupMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata,
   MetadataBase.RowGroupMetadata rowGroupMetadata, int rgIndexInFile, Path 
location) {
 Map columnsStatistics = 
getRowGroupColumnStatistics(tableMetadata, rowGroupMetadata);
-Map rowGroupStatistics = new HashMap<>();
-rowGroupStatistics.put(TableStatisticsKind.ROW_COUNT, 
rowGroupMetadata.getRowCount());
-rowGroupStatistics.put(() -> ExactStatisticsConstants.START, 
rowGroupMetadata.getStart());
-rowGroupStatistics.put(() -> ExactStatisticsConstants.LENGTH, 
rowGroupMetadata.getLength());
+List rowGroupStatistics = new ArrayList<>();
+rowGroupStatistics.add(new 
StatisticsHolder<>(rowGroupMetadata.getRowCount(), 
TableStatisticsKind.ROW_COUNT));
+rowGroupStatistics.add(new StatisticsHolder<>(rowGroupMetadata.getStart(), 
new BaseStatisticsKind(ExactStatisticsConstants.START, true)));
+rowGroupStatistics.add(new 
StatisticsHolder<>(rowGroupMetadata.getLength(), new 
BaseStatisticsKind(ExactStatisticsConstants.LENGTH, true)));
 
 Map columns = 
getRowGroupFields(tableMetadata, rowGroupMetadata);
 
 TupleSchema schema = new TupleSchema();
 columns.forEach((schemaPath, majorType) -> 
MetadataUtils.addColumnMetadata(schema, schemaPath, majorType));
 
-return new RowGroupMetadata(
-schema, columnsStatistics, rowGroupStatistics, 
rowGroupMetadata.getHostAffinity(), rgIndexInFile, location);
-  }
+MetadataInfo metadataInfo = new MetadataInfo(MetadataType.ROW_GROUP, 
MetadataInfo.GENERAL_INFO_KEY, null);
 
-  /**
-   * Merges list of specified metadata into the map of {@link 
ColumnStatistics} with columns as keys.
-   *
-   * @param  type of metadata to collect
-   * @param metadataListlist of metadata to be merged
-   * @param columns set of columns whose statistics should be 
merged
-   * @param statisticsToCollect kinds of statistics that should be collected
-   * @param parquetTableMetadata ParquetTableMetadata object to fetch the 
non-interesting columns
-   * @return list of merged metadata
-   */
-  @SuppressWarnings("unchecked")
-  public static  Map 
mergeColumnsStatistics(
-  Collection metadataList, Set columns, 
List statisticsToCollect, 
MetadataBase.ParquetTableMetadataBase parquetTableMetadata) {
-Map columnsStatistics = new HashMap<>();
-
-for (SchemaPath column : columns) {
-  List statisticsList = new ArrayList<>();
-  for (T metadata : metadataList) {
-ColumnStatistics statistics = 
metadata.getColumnsStatistics().get(column);
-if (statistics == null) {
-  // schema change happened, set statistics which represents all nulls
-  statistics = new ColumnStatisticsImpl(
-  ImmutableMap.of(ColumnStatisticsKind.NULLS_COUNT, 
metadata.getStatistic(TableStatisticsKind.ROW_COUNT)),
-  getNaturalNullsFirstComparator());
-}
-statisticsList.add(statistics);
-  }
-  Map statisticsMap = new HashMap<>();
-  for (CollectableColumnStatisticsKind statisticsKind : 
statisticsToCollect) {
-Object mergedStatistic = 
statisticsKind.mergeStatistics(statisticsList);
-statisticsMap.put(statisticsKind, mergedStatistic);
-  }
-  columnsStatistics.put(column, new ColumnStatisticsImpl(statisticsMap, 
statisticsList.iterator().next().getValueComparator()));
-}
-return columnsStatistics;
+return new RowGroupMetadata(TableInfo.UNKNOWN_TABLE_INFO, metadataInfo,
+schema, columnsStatistics, rowGroupStatistics, 
rowGroupMetadata.getHostAffinity(), rgIndexInFile, location);
   }
 
   /**
* Returns {@link FileMetadata} instance received by merging specified 
{@link RowGroupMetadata} list.
*
* @param rowGroups list of {@link RowGroupMetadata} to be merged
-   * @param tableName name of the table
-   * @param parquetTableMetadata the source of column metadata for 
non-interesting column's statistics
* @return {@link FileMetadata} instance
*/
-  public static FileMetadata getFileMetadata(List rowGroups, 
String tableName,
-  MetadataBase.ParquetTableMetadataBase parquetTableMetadata) {
+  public static FileMetadata getFileMetadata(List rowGroups) 
{
 if (rowGroups.isEmpty()) {
   return 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869600#comment-16869600
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296245208
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ##
 @@ -594,50 +594,66 @@ public static long getDateTimeValueFromBinary(Binary 
binaryTimeStampValue, boole
* @return major type
*/
   public static TypeProtos.MajorType getType(PrimitiveTypeName type, 
OriginalType originalType, int scale, int precision) {
+TypeProtos.MinorType minorType = getMinorType(type, originalType);
+if (originalType == OriginalType.DECIMAL) {
+  return Types.withScaleAndPrecision(minorType, 
TypeProtos.DataMode.OPTIONAL, scale, precision);
+}
+
+return Types.optional(minorType);
+  }
+
+  /**
+   * Builds minor type using given {@code OriginalType originalType} or {@code 
PrimitiveTypeName type}.
+   *
+   * @param type parquet primitive type
+   * @param originalType parquet original type
+   * @return major type
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869611#comment-16869611
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296287677
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -361,6 +365,102 @@ public FileMetadata getFileMetadata(Path location) {
 return new ArrayList<>(getFilesMetadataMap().values());
   }
 
+  @Override
+  public List getSegmentsMetadata() {
+return new ArrayList<>(getSegmentsMetadataMap().values());
+  }
+
+  @Override
+  public Map getSegmentsMetadataMap() {
+if (segments == null) {
+  if (entries.isEmpty() || !collectMetadata) {
+return Collections.emptyMap();
+  }
+
+  segments = new LinkedHashMap<>();
+
+  Path fileLocation = getFilesMetadata().iterator().next().getPath();
+  int levelsCount = fileLocation.depth() - tableLocation.depth();
+
+  Map filesMetadata = getFilesMetadataMap();
+  int segmentsIndex = levelsCount - 1;
+  Map segmentMetadata = 
getSegmentsForMetadata(filesMetadata,
+  SchemaPath.getSimplePath(MetadataInfo.DEFAULT_COLUMN_PREFIX + 
segmentsIndex));
+  segments.putAll(segmentMetadata);
+  for (int i = segmentsIndex - 1; i >= 0; i--) {
+String segmentColumn = MetadataInfo.DEFAULT_COLUMN_PREFIX + i;
+segmentMetadata = getMetadataForSegments(segmentMetadata,
+SchemaPath.getSimplePath(segmentColumn));
+segments.putAll(segmentMetadata);
+  }
+
+}
+return segments;
+  }
+
+  private static  Map getSegmentsForMetadata(
+  Map metadata, SchemaPath column) {
+Multimap metadataMultimap = LinkedListMultimap.create();
+metadata.forEach((key, value) -> metadataMultimap.put(key.getParent(), 
value));
+
+Map result = new HashMap<>();
+metadataMultimap.asMap().forEach((key, value) -> result.put(key, 
combineToSegmentMetadata(value, column)));
+
+return result;
+  }
+
+  private static Map getMetadataForSegments(Map metadata, SchemaPath column) {
+Multimap metadataMultimap = 
LinkedListMultimap.create();
+metadata.forEach((key, value) -> metadataMultimap.put(key.getParent(), 
value));
+
+Map result = new HashMap<>();
+metadataMultimap.asMap().forEach((key, value) -> result.put(key, 
combineSegmentMetadata(value, column)));
+
+return result;
+  }
+
+  private static  SegmentMetadata 
combineToSegmentMetadata(Collection metadataList, SchemaPath column) {
+List metadataLocations = metadataList.stream()
+.map(metadata -> metadata.getPath()) // used lambda instead of method 
reference due to JDK-8141508
+.collect(Collectors.toList());
+return combineToSegmentMetadata(metadataList, column, metadataLocations);
+  }
+
+  private static SegmentMetadata 
combineSegmentMetadata(Collection metadataList, SchemaPath 
column) {
+List metadataLocations = metadataList.stream()
+.flatMap(metadata -> metadata.getLocations().stream())
+.collect(Collectors.toList());
+
+return combineToSegmentMetadata(metadataList, column, metadataLocations);
+  }
+
+  /**
+   * Returns {@link SegmentMetadata} which is combined metadata of list of 
specified metadata
+   *
+   * @param metadataList  metadata to combine
+   * @param columnsegment column
+   * @param metadataLocations locations of metadata combined in resulting 
segment
+   * @paramtype of metadata to combine
+   * @return {@link SegmentMetadata} from combined metadata
+   */
+  private static  SegmentMetadata 
combineToSegmentMetadata(Collection metadataList,
+  SchemaPath column, List metadataLocations) {
+List segmentStatistics =
+Collections.singletonList(
+new StatisticsHolder<>(
+TableStatisticsKind.ROW_COUNT.mergeStatistics(metadataList),
+TableStatisticsKind.ROW_COUNT));
+// this code is used only to collect segment metadata to be used only 
during filtering,
+// so metadata identifier is not required here and in other places in this 
class
+MetadataInfo metadataInfo = new MetadataInfo(MetadataType.SEGMENT, 
MetadataInfo.GENERAL_INFO_KEY, null);
+T firstMetadata = metadataList.iterator().next();
+
+return new SegmentMetadata(firstMetadata.getTableInfo(), metadataInfo, 
column, firstMetadata.getSchema(),
 
 Review comment:
   Agree, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869602#comment-16869602
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296243925
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -478,27 +482,27 @@ public static ObjectMapper getMapper() {
* @param fieldName name of the columns whose statistics should be 
obtained
* @return map of {@link StatisticsKind} and statistics values
*/
-  public static Map 
getEstimatedColumnStats(DrillStatsTable statsProvider, SchemaPath fieldName) {
+  public static List getEstimatedColumnStats(DrillStatsTable 
statsProvider, SchemaPath fieldName) {
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869491#comment-16869491
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296234781
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataProvider.java
 ##
 @@ -71,6 +73,27 @@
*/
   List getFilesMetadata();
 
 Review comment:
   may be removed in favor of method which returns map 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map columnsStatistics;
>   Map statistics;
>   List partitionValues;
>   List 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869490#comment-16869490
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296240291
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/Metadata.java
 ##
 @@ -86,4 +88,7 @@
* @return {@link ColumnMetadata} schema description of the column
*/
   ColumnMetadata getColumn(SchemaPath name);
+
+  TableInfo getTableInfo();
+  MetadataInfo getMetadataInfo();
 
 Review comment:
   add new line between methods for consistency 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869485#comment-16869485
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296227707
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ##
 @@ -594,50 +594,66 @@ public static long getDateTimeValueFromBinary(Binary 
binaryTimeStampValue, boole
* @return major type
*/
   public static TypeProtos.MajorType getType(PrimitiveTypeName type, 
OriginalType originalType, int scale, int precision) {
+TypeProtos.MinorType minorType = getMinorType(type, originalType);
+if (originalType == OriginalType.DECIMAL) {
+  return Types.withScaleAndPrecision(minorType, 
TypeProtos.DataMode.OPTIONAL, scale, precision);
+}
+
+return Types.optional(minorType);
+  }
+
+  /**
+   * Builds minor type using given {@code OriginalType originalType} or {@code 
PrimitiveTypeName type}.
+   *
+   * @param type parquet primitive type
+   * @param originalType parquet original type
+   * @return major type
 
 Review comment:
   ```suggestion
  * @return minor type
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869493#comment-16869493
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296189471
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -461,14 +465,14 @@ public static ObjectMapper getMapper() {
* @param statsProvider the source of statistics
* @return map of {@link StatisticsKind} and statistics values
*/
-  public static Map 
getEstimatedTableStats(DrillStatsTable statsProvider) {
+  public static List getEstimatedTableStats(DrillStatsTable 
statsProvider) {
 if (statsProvider != null && statsProvider.isMaterialized()) {
-  Map tableStatistics = new HashMap<>();
-  tableStatistics.put(TableStatisticsKind.EST_ROW_COUNT, 
statsProvider.getRowCount());
-  tableStatistics.put(TableStatisticsKind.HAS_STATISTICS, Boolean.TRUE);
+  List tableStatistics = new ArrayList<>();
 
 Review comment:
   ```java
 return Arrays.asList(
 new StatisticsHolder<>(statsProvider.getRowCount(), 
TableStatisticsKind.EST_ROW_COUNT),
 new StatisticsHolder<>(Boolean.TRUE, 
TableStatisticsKind.HAS_DESCRIPTIVE_STATISTICS)
 );
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869486#comment-16869486
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296225077
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -361,6 +365,102 @@ public FileMetadata getFileMetadata(Path location) {
 return new ArrayList<>(getFilesMetadataMap().values());
   }
 
+  @Override
+  public List getSegmentsMetadata() {
+return new ArrayList<>(getSegmentsMetadataMap().values());
+  }
+
+  @Override
+  public Map getSegmentsMetadataMap() {
+if (segments == null) {
+  if (entries.isEmpty() || !collectMetadata) {
+return Collections.emptyMap();
+  }
+
+  segments = new LinkedHashMap<>();
+
+  Path fileLocation = getFilesMetadata().iterator().next().getPath();
+  int levelsCount = fileLocation.depth() - tableLocation.depth();
+
+  Map filesMetadata = getFilesMetadataMap();
+  int segmentsIndex = levelsCount - 1;
+  Map segmentMetadata = 
getSegmentsForMetadata(filesMetadata,
+  SchemaPath.getSimplePath(MetadataInfo.DEFAULT_COLUMN_PREFIX + 
segmentsIndex));
+  segments.putAll(segmentMetadata);
+  for (int i = segmentsIndex - 1; i >= 0; i--) {
+String segmentColumn = MetadataInfo.DEFAULT_COLUMN_PREFIX + i;
+segmentMetadata = getMetadataForSegments(segmentMetadata,
+SchemaPath.getSimplePath(segmentColumn));
+segments.putAll(segmentMetadata);
+  }
+
+}
+return segments;
+  }
+
+  private static  Map getSegmentsForMetadata(
+  Map metadata, SchemaPath column) {
+Multimap metadataMultimap = LinkedListMultimap.create();
+metadata.forEach((key, value) -> metadataMultimap.put(key.getParent(), 
value));
+
+Map result = new HashMap<>();
+metadataMultimap.asMap().forEach((key, value) -> result.put(key, 
combineToSegmentMetadata(value, column)));
+
+return result;
+  }
+
+  private static Map getMetadataForSegments(Map metadata, SchemaPath column) {
+Multimap metadataMultimap = 
LinkedListMultimap.create();
+metadata.forEach((key, value) -> metadataMultimap.put(key.getParent(), 
value));
+
+Map result = new HashMap<>();
+metadataMultimap.asMap().forEach((key, value) -> result.put(key, 
combineSegmentMetadata(value, column)));
+
+return result;
+  }
+
+  private static  SegmentMetadata 
combineToSegmentMetadata(Collection metadataList, SchemaPath column) {
+List metadataLocations = metadataList.stream()
+.map(metadata -> metadata.getPath()) // used lambda instead of method 
reference due to JDK-8141508
+.collect(Collectors.toList());
+return combineToSegmentMetadata(metadataList, column, metadataLocations);
+  }
+
+  private static SegmentMetadata 
combineSegmentMetadata(Collection metadataList, SchemaPath 
column) {
+List metadataLocations = metadataList.stream()
+.flatMap(metadata -> metadata.getLocations().stream())
+.collect(Collectors.toList());
+
+return combineToSegmentMetadata(metadataList, column, metadataLocations);
+  }
+
+  /**
+   * Returns {@link SegmentMetadata} which is combined metadata of list of 
specified metadata
+   *
+   * @param metadataList  metadata to combine
+   * @param columnsegment column
+   * @param metadataLocations locations of metadata combined in resulting 
segment
+   * @paramtype of metadata to combine
+   * @return {@link SegmentMetadata} from combined metadata
+   */
+  private static  SegmentMetadata 
combineToSegmentMetadata(Collection metadataList,
+  SchemaPath column, List metadataLocations) {
+List segmentStatistics =
+Collections.singletonList(
+new StatisticsHolder<>(
+TableStatisticsKind.ROW_COUNT.mergeStatistics(metadataList),
+TableStatisticsKind.ROW_COUNT));
+// this code is used only to collect segment metadata to be used only 
during filtering,
+// so metadata identifier is not required here and in other places in this 
class
+MetadataInfo metadataInfo = new MetadataInfo(MetadataType.SEGMENT, 
MetadataInfo.GENERAL_INFO_KEY, null);
+T firstMetadata = metadataList.iterator().next();
+
+return new SegmentMetadata(firstMetadata.getTableInfo(), metadataInfo, 
column, firstMetadata.getSchema(),
 
 Review comment:
   It's 10 arguments constructor. Maybe it's time for builder ? :) 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869482#comment-16869482
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296219843
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -342,17 +350,13 @@ public TableMetadata getTableMetadata() {
 
   @Override
   public FileMetadata getFileMetadata(Path location) {
-return getFilesMetadata().stream()
-.filter(Objects::nonNull)
-.filter(fileMetadata -> location.equals(fileMetadata.getLocation()))
-.findAny()
-.orElse(null);
+return getFilesMetadataMap().get(location);
   }
 
   @Override
   public List getFilesForPartition(PartitionMetadata partition) {
-return getFilesMetadata().stream()
-.filter(file -> partition.getLocations().contains(file.getLocation()))
+return partition.getLocations().stream()
+.map(location -> getFilesMetadataMap().get(location))
 
 Review comment:
   consider adding null check before collecting to list
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869483#comment-16869483
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296215883
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -414,6 +431,14 @@ public void modifyFileSelection(FileSelection selection) {
 }
 partitions = newPartitions;
 
+Map newSegments = new HashMap<>();
+if (!getSegmentsMetadata().isEmpty()) {
+  this.segments = getSegmentsMetadata().entrySet().stream()
+  .filter(entry -> fileSet.contains(entry.getKey()))
+  .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
+}
+segments = newSegments;
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869492#comment-16869492
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296238818
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataUtils.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.metadata.BaseMetadata;
+import org.apache.drill.metastore.metadata.TableMetadata;
+import org.apache.drill.metastore.statistics.CollectableColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.ColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.TableStatisticsKind;
+import 
org.apache.drill.shaded.guava.com.google.common.primitives.UnsignedBytes;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class TableMetadataUtils {
+
+  private TableMetadataUtils() {
+throw new IllegalStateException("Utility class");
+  }
+
+  /**
+   * Returns {@link Comparator} instance considering specified {@code type}.
+   *
+   * @param type type of the column
+   * @return {@link Comparator} instance
+   */
+  public static Comparator getComparator(TypeProtos.MinorType type) {
+switch (type) {
+  case INTERVALDAY:
+  case INTERVAL:
+  case INTERVALYEAR:
+return 
Comparator.nullsFirst(UnsignedBytes.lexicographicalComparator());
+  case UINT1:
+return Comparator.nullsFirst(UnsignedBytes::compare);
+  case UINT2:
+  case UINT4:
+return Comparator.nullsFirst(Integer::compareUnsigned);
+  case UINT8:
+return Comparator.nullsFirst(Long::compareUnsigned);
+  default:
+return getNaturalNullsFirstComparator();
+}
+  }
+
+  /**
+   * Returns "natural order" comparator which threads nulls as min values.
+   *
+   * @param  type to compare
+   * @return "natural order" comparator
+   */
+  public static > Comparator 
getNaturalNullsFirstComparator() {
+return Comparator.nullsFirst(Comparator.naturalOrder());
+  }
+
+  /**
+   * Merges list of specified metadata into the map of {@link 
ColumnStatistics} with columns as keys.
+   *
+   * @param  type of metadata to collect
+   * @param metadataListlist of metadata to be merged
+   * @param columns set of columns whose statistics should be 
merged
+   * @param statisticsToCollect kinds of statistics that should be collected
+   * @return list of merged metadata
+   */
+  public static  Map 
mergeColumnsStatistics(
+Collection metadataList, Set columns, 
List statisticsToCollect) {
+Map columnsStatistics = new HashMap<>();
+
+for (SchemaPath column : columns) {
+  List statisticsList = new ArrayList<>();
+  for (T metadata : metadataList) {
+ColumnStatistics statistics = 
metadata.getColumnsStatistics().get(column);
+if (statistics == null) {
+  // schema change happened, set statistics which represents all nulls
+  statistics = new ColumnStatistics(
+  Collections.singletonList(
+  new 
StatisticsHolder<>(TableStatisticsKind.ROW_COUNT.getValue(metadata), 
ColumnStatisticsKind.NULLS_COUNT)));
+}
+statisticsList.add(statistics);
+  }
+  List statisticsHolders = new ArrayList<>();
+  for (CollectableColumnStatisticsKind statisticsKind : 
statisticsToCollect) {
+Object mergedStatistic = 
statisticsKind.mergeStatistics(statisticsList);
+   

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869481#comment-16869481
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296213539
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -247,6 +250,13 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 matchAllMetadata = builder.isMatchAllMetadata();
 return null;
   }
+} else if (!getSegmentsMetadata().isEmpty()) {
+  if (!builder.getSegments().isEmpty() && getSegmentsMetadata().size() == 
builder.getSegments().size()) {
 
 Review comment:
   ```suggestion
 if (getSegmentsMetadata().size() == builder.getSegments().size()) {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869484#comment-16869484
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296223414
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -361,6 +365,102 @@ public FileMetadata getFileMetadata(Path location) {
 return new ArrayList<>(getFilesMetadataMap().values());
   }
 
+  @Override
+  public List getSegmentsMetadata() {
+return new ArrayList<>(getSegmentsMetadataMap().values());
+  }
+
+  @Override
+  public Map getSegmentsMetadataMap() {
+if (segments == null) {
+  if (entries.isEmpty() || !collectMetadata) {
+return Collections.emptyMap();
+  }
+
+  segments = new LinkedHashMap<>();
+
+  Path fileLocation = getFilesMetadata().iterator().next().getPath();
+  int levelsCount = fileLocation.depth() - tableLocation.depth();
+
+  Map filesMetadata = getFilesMetadataMap();
+  int segmentsIndex = levelsCount - 1;
+  Map segmentMetadata = 
getSegmentsForMetadata(filesMetadata,
+  SchemaPath.getSimplePath(MetadataInfo.DEFAULT_COLUMN_PREFIX + 
segmentsIndex));
+  segments.putAll(segmentMetadata);
+  for (int i = segmentsIndex - 1; i >= 0; i--) {
+String segmentColumn = MetadataInfo.DEFAULT_COLUMN_PREFIX + i;
+segmentMetadata = getMetadataForSegments(segmentMetadata,
+SchemaPath.getSimplePath(segmentColumn));
+segments.putAll(segmentMetadata);
+  }
+
+}
+return segments;
+  }
+
+  private static  Map getSegmentsForMetadata(
+  Map metadata, SchemaPath column) {
+Multimap metadataMultimap = LinkedListMultimap.create();
+metadata.forEach((key, value) -> metadataMultimap.put(key.getParent(), 
value));
+
+Map result = new HashMap<>();
+metadataMultimap.asMap().forEach((key, value) -> result.put(key, 
combineToSegmentMetadata(value, column)));
+
+return result;
+  }
+
+  private static Map getMetadataForSegments(Map metadata, SchemaPath column) {
+Multimap metadataMultimap = 
LinkedListMultimap.create();
+metadata.forEach((key, value) -> metadataMultimap.put(key.getParent(), 
value));
+
+Map result = new HashMap<>();
+metadataMultimap.asMap().forEach((key, value) -> result.put(key, 
combineSegmentMetadata(value, column)));
+
+return result;
+  }
+
+  private static  SegmentMetadata 
combineToSegmentMetadata(Collection metadataList, SchemaPath column) {
+List metadataLocations = metadataList.stream()
+.map(metadata -> metadata.getPath()) // used lambda instead of method 
reference due to JDK-8141508
+.collect(Collectors.toList());
+return combineToSegmentMetadata(metadataList, column, metadataLocations);
+  }
+
+  private static SegmentMetadata 
combineSegmentMetadata(Collection metadataList, SchemaPath 
column) {
+List metadataLocations = metadataList.stream()
+.flatMap(metadata -> metadata.getLocations().stream())
+.collect(Collectors.toList());
+
+return combineToSegmentMetadata(metadataList, column, metadataLocations);
+  }
+
+  /**
+   * Returns {@link SegmentMetadata} which is combined metadata of list of 
specified metadata
+   *
+   * @param metadataList  metadata to combine
+   * @param columnsegment column
+   * @param metadataLocations locations of metadata combined in resulting 
segment
+   * @paramtype of metadata to combine
+   * @return {@link SegmentMetadata} from combined metadata
+   */
+  private static  SegmentMetadata 
combineToSegmentMetadata(Collection metadataList,
+  SchemaPath column, List metadataLocations) {
+List segmentStatistics =
+Collections.singletonList(
+new StatisticsHolder<>(
+TableStatisticsKind.ROW_COUNT.mergeStatistics(metadataList),
+TableStatisticsKind.ROW_COUNT));
+// this code is used only to collect segment metadata to be used only 
during filtering,
+// so metadata identifier is not required here and in other places in this 
class
+MetadataInfo metadataInfo = new MetadataInfo(MetadataType.SEGMENT, 
MetadataInfo.GENERAL_INFO_KEY, null);
+T firstMetadata = metadataList.iterator().next();
+
+return new SegmentMetadata(firstMetadata.getTableInfo(), metadataInfo, 
column, firstMetadata.getSchema(),
+metadataList.iterator().next().getPath().getParent(),
 
 Review comment:
   ```suggestion
   firstMetadata.getPath().getParent(),
   ```
 

This is an automated message from the Apache Git Service.
To respond to the 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869487#comment-16869487
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296229552
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetTableMetadataUtils.java
 ##
 @@ -148,112 +142,71 @@ private ParquetTableMetadataUtils() {
   public static RowGroupMetadata 
getRowGroupMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata,
   MetadataBase.RowGroupMetadata rowGroupMetadata, int rgIndexInFile, Path 
location) {
 Map columnsStatistics = 
getRowGroupColumnStatistics(tableMetadata, rowGroupMetadata);
-Map rowGroupStatistics = new HashMap<>();
-rowGroupStatistics.put(TableStatisticsKind.ROW_COUNT, 
rowGroupMetadata.getRowCount());
-rowGroupStatistics.put(() -> ExactStatisticsConstants.START, 
rowGroupMetadata.getStart());
-rowGroupStatistics.put(() -> ExactStatisticsConstants.LENGTH, 
rowGroupMetadata.getLength());
+List rowGroupStatistics = new ArrayList<>();
+rowGroupStatistics.add(new 
StatisticsHolder<>(rowGroupMetadata.getRowCount(), 
TableStatisticsKind.ROW_COUNT));
+rowGroupStatistics.add(new StatisticsHolder<>(rowGroupMetadata.getStart(), 
new BaseStatisticsKind(ExactStatisticsConstants.START, true)));
+rowGroupStatistics.add(new 
StatisticsHolder<>(rowGroupMetadata.getLength(), new 
BaseStatisticsKind(ExactStatisticsConstants.LENGTH, true)));
 
 Map columns = 
getRowGroupFields(tableMetadata, rowGroupMetadata);
 
 TupleSchema schema = new TupleSchema();
 columns.forEach((schemaPath, majorType) -> 
MetadataUtils.addColumnMetadata(schema, schemaPath, majorType));
 
-return new RowGroupMetadata(
-schema, columnsStatistics, rowGroupStatistics, 
rowGroupMetadata.getHostAffinity(), rgIndexInFile, location);
-  }
+MetadataInfo metadataInfo = new MetadataInfo(MetadataType.ROW_GROUP, 
MetadataInfo.GENERAL_INFO_KEY, null);
 
-  /**
-   * Merges list of specified metadata into the map of {@link 
ColumnStatistics} with columns as keys.
-   *
-   * @param  type of metadata to collect
-   * @param metadataListlist of metadata to be merged
-   * @param columns set of columns whose statistics should be 
merged
-   * @param statisticsToCollect kinds of statistics that should be collected
-   * @param parquetTableMetadata ParquetTableMetadata object to fetch the 
non-interesting columns
-   * @return list of merged metadata
-   */
-  @SuppressWarnings("unchecked")
-  public static  Map 
mergeColumnsStatistics(
-  Collection metadataList, Set columns, 
List statisticsToCollect, 
MetadataBase.ParquetTableMetadataBase parquetTableMetadata) {
-Map columnsStatistics = new HashMap<>();
-
-for (SchemaPath column : columns) {
-  List statisticsList = new ArrayList<>();
-  for (T metadata : metadataList) {
-ColumnStatistics statistics = 
metadata.getColumnsStatistics().get(column);
-if (statistics == null) {
-  // schema change happened, set statistics which represents all nulls
-  statistics = new ColumnStatisticsImpl(
-  ImmutableMap.of(ColumnStatisticsKind.NULLS_COUNT, 
metadata.getStatistic(TableStatisticsKind.ROW_COUNT)),
-  getNaturalNullsFirstComparator());
-}
-statisticsList.add(statistics);
-  }
-  Map statisticsMap = new HashMap<>();
-  for (CollectableColumnStatisticsKind statisticsKind : 
statisticsToCollect) {
-Object mergedStatistic = 
statisticsKind.mergeStatistics(statisticsList);
-statisticsMap.put(statisticsKind, mergedStatistic);
-  }
-  columnsStatistics.put(column, new ColumnStatisticsImpl(statisticsMap, 
statisticsList.iterator().next().getValueComparator()));
-}
-return columnsStatistics;
+return new RowGroupMetadata(TableInfo.UNKNOWN_TABLE_INFO, metadataInfo,
+schema, columnsStatistics, rowGroupStatistics, 
rowGroupMetadata.getHostAffinity(), rgIndexInFile, location);
   }
 
   /**
* Returns {@link FileMetadata} instance received by merging specified 
{@link RowGroupMetadata} list.
*
* @param rowGroups list of {@link RowGroupMetadata} to be merged
-   * @param tableName name of the table
-   * @param parquetTableMetadata the source of column metadata for 
non-interesting column's statistics
* @return {@link FileMetadata} instance
*/
-  public static FileMetadata getFileMetadata(List rowGroups, 
String tableName,
-  MetadataBase.ParquetTableMetadataBase parquetTableMetadata) {
+  public static FileMetadata getFileMetadata(List rowGroups) 
{
 if (rowGroups.isEmpty()) {
   return 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869494#comment-16869494
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296188699
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -461,14 +465,14 @@ public static ObjectMapper getMapper() {
* @param statsProvider the source of statistics
* @return map of {@link StatisticsKind} and statistics values
*/
-  public static Map 
getEstimatedTableStats(DrillStatsTable statsProvider) {
+  public static List getEstimatedTableStats(DrillStatsTable 
statsProvider) {
 
 Review comment:
   Please fix javadoc according to changes in the method. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869495#comment-16869495
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296189635
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -478,27 +482,27 @@ public static ObjectMapper getMapper() {
* @param fieldName name of the columns whose statistics should be 
obtained
* @return map of {@link StatisticsKind} and statistics values
*/
-  public static Map 
getEstimatedColumnStats(DrillStatsTable statsProvider, SchemaPath fieldName) {
+  public static List getEstimatedColumnStats(DrillStatsTable 
statsProvider, SchemaPath fieldName) {
 
 Review comment:
   Please update javadoc
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869489#comment-16869489
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296238254
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataUtils.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.metadata.BaseMetadata;
+import org.apache.drill.metastore.metadata.TableMetadata;
+import org.apache.drill.metastore.statistics.CollectableColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.ColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.TableStatisticsKind;
+import 
org.apache.drill.shaded.guava.com.google.common.primitives.UnsignedBytes;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class TableMetadataUtils {
+
+  private TableMetadataUtils() {
+throw new IllegalStateException("Utility class");
+  }
+
+  /**
+   * Returns {@link Comparator} instance considering specified {@code type}.
+   *
+   * @param type type of the column
+   * @return {@link Comparator} instance
+   */
+  public static Comparator getComparator(TypeProtos.MinorType type) {
+switch (type) {
+  case INTERVALDAY:
+  case INTERVAL:
+  case INTERVALYEAR:
+return 
Comparator.nullsFirst(UnsignedBytes.lexicographicalComparator());
+  case UINT1:
+return Comparator.nullsFirst(UnsignedBytes::compare);
+  case UINT2:
+  case UINT4:
+return Comparator.nullsFirst(Integer::compareUnsigned);
+  case UINT8:
+return Comparator.nullsFirst(Long::compareUnsigned);
+  default:
+return getNaturalNullsFirstComparator();
+}
+  }
+
+  /**
+   * Returns "natural order" comparator which threads nulls as min values.
+   *
+   * @param  type to compare
+   * @return "natural order" comparator
+   */
+  public static > Comparator 
getNaturalNullsFirstComparator() {
+return Comparator.nullsFirst(Comparator.naturalOrder());
+  }
+
+  /**
+   * Merges list of specified metadata into the map of {@link 
ColumnStatistics} with columns as keys.
+   *
+   * @param  type of metadata to collect
+   * @param metadataListlist of metadata to be merged
+   * @param columns set of columns whose statistics should be 
merged
+   * @param statisticsToCollect kinds of statistics that should be collected
+   * @return list of merged metadata
+   */
+  public static  Map 
mergeColumnsStatistics(
+Collection metadataList, Set columns, 
List statisticsToCollect) {
+Map columnsStatistics = new HashMap<>();
+
+for (SchemaPath column : columns) {
+  List statisticsList = new ArrayList<>();
+  for (T metadata : metadataList) {
+ColumnStatistics statistics = 
metadata.getColumnsStatistics().get(column);
+if (statistics == null) {
+  // schema change happened, set statistics which represents all nulls
+  statistics = new ColumnStatistics(
+  Collections.singletonList(
+  new 
StatisticsHolder<>(TableStatisticsKind.ROW_COUNT.getValue(metadata), 
ColumnStatisticsKind.NULLS_COUNT)));
+}
+statisticsList.add(statistics);
+  }
+  List statisticsHolders = new ArrayList<>();
+  for (CollectableColumnStatisticsKind statisticsKind : 
statisticsToCollect) {
+Object mergedStatistic = 
statisticsKind.mergeStatistics(statisticsList);
+   

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868325#comment-16868325
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810
 
 
   For details please see 
[DRILL-7271](https://issues.apache.org/jira/browse/DRILL-7271).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map columnsStatistics;
>   Map statistics;
>   List partitionValues;
>   List locations;
>   long lastModifiedTime;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)