[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5470: Derive numDocsPerChunk for var byte raw index from metadata only if config is enabled.

GitBox Sat, 30 May 2020 17:00:57 -0700


mayankshriv commented on a change in pull request #5470:
URL: https://github.com/apache/incubator-pinot/pull/5470#discussion_r432896660




##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java
##########
@@ -330,8 +330,32 @@ public long getLatestIngestionTimestamp() {
    */
   private boolean isNoDictionaryColumn(Set<String> noDictionaryColumns, 
Set<String> invertedIndexColumns,
       Set<String> textIndexColumns, FieldSpec fieldSpec, String column) {
-    return textIndexColumns.contains(column) || 
(noDictionaryColumns.contains(column) && fieldSpec.isSingleValueField()
-        && !invertedIndexColumns.contains(column));
+    if (textIndexColumns.contains(column)) {
+      // text column is no dictionary currently
+      return true;
+    }
+    FieldSpec.DataType dataType = fieldSpec.getDataType();
+    if (noDictionaryColumns.contains(column)) {
+      // Earlier we didn't support noDict in consuming segments for STRING and 
BYTES columns.
+      // So even if the user had the column in noDictionaryColumns set in 
table config, we still
+      // created dictionary in consuming segments.
+      // Later on we added this support. There is a particular impact of this 
change on the use cases
+      // that have set noDict on their STRING dimension columns for other 
performance
+      // reasons and also want metricsAggregation. These use cases don't get to
+      // aggregateMetrics because the new implementation is able to honor 
their table config setting
+      // of noDict on STRING/BYTES. Without metrics aggregation, memory 
pressure increases.
+      // So to continue aggregating metrics for such cases, we will create 
dictionary even
+      // if the column is part of noDictionary set from table config
+      if (fieldSpec instanceof DimensionFieldSpec && _aggregateMetrics && 
(dataType == FieldSpec.DataType.STRING ||

Review comment:
       Log message for this?

##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -213,6 +215,14 @@ public void init(SegmentGeneratorConfig 
segmentCreationSpec, SegmentIndexCreatio
     }
   }
 
+  public static boolean shouldDeriveNumDocsPerChunk(String columnName, 
Map<String, Map<String, String>> columnProperties) {
+    if (columnProperties != null) {
+      Map<String, String> properties = columnProperties.get(columnName);
+      return properties != null && 
Boolean.parseBoolean(properties.get(FieldConfig.DERIVE_NUM_DOCS_PER_CHUNK_RAW_INDEX_KEY));

Review comment:
       Defaults to false, right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5470: Derive numDocsPerChunk for var byte raw index from metadata only if config is enabled.

Reply via email to