[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement

2018-07-11 Thread GitBox
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201807628
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) {
   }
   return "NA";
 }
+
+private boolean isBatchStatsEnabledForOperator(FragmentContext context, 
OperatorContext oContext) {
+  // The configuration can select what operators should log batch 
statistics
+  final String statsLoggingOperator = 
context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase();
+  final String allOperatorsStr = "ALL";
+
+  // All operators are allowed to log batch statistics
+  if (allOperatorsStr.equals(statsLoggingOperator)) {
+return true;
+  }
+
+  // No, only a select few are allowed; syntax: 
operator-id-1,operator-id-2,..
+  final String[] operators = statsLoggingOperator.split(",");
+  final String operatorId = oContext.getStats().getId().toUpperCase();
+
+  for (int idx = 0; idx < operators.length; idx++) {
+// We use "contains" because the operator identifier is a composite 
string; e.g., 3:[PARQUET_ROW_GROUP_SCAN]
+if (operatorId.contains(operators[idx])) {
+  return true;
+}
+  }
+
+  return false;
+}
+  }
+
+  /**
+   * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, 
RecordBatchStatsContext)}
+   */
+  public static void logRecordBatchStats(RecordBatch recordBatch,
+RecordBatchStatsContext batchStatsContext) {
+
+logRecordBatchStats(null, recordBatch, batchStatsContext);
+  }
+
+  /**
+   * Logs record batch statistics for the input record batch (logging happens 
only
+   * when record statistics logging is enabled).
+   *
+   * @param sourceId optional source identifier for scanners
+   * @param recordBatch a set of records
+   * @param batchStatsContext batch stats context object
+   */
+  public static void logRecordBatchStats(String sourceId,
 
 Review comment:
   Also, in a mid-stream operator, how to do I print stats in a way that will 
allow one to distinguish between the incoming batch stats and the outgoing 
batch stats in the log file ?
   
   Should there be a *String msg* parameter to allow the caller to tag the 
output ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement

2018-07-11 Thread GitBox
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201806499
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) {
   }
   return "NA";
 }
+
+private boolean isBatchStatsEnabledForOperator(FragmentContext context, 
OperatorContext oContext) {
+  // The configuration can select what operators should log batch 
statistics
+  final String statsLoggingOperator = 
context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase();
+  final String allOperatorsStr = "ALL";
+
+  // All operators are allowed to log batch statistics
+  if (allOperatorsStr.equals(statsLoggingOperator)) {
+return true;
+  }
+
+  // No, only a select few are allowed; syntax: 
operator-id-1,operator-id-2,..
+  final String[] operators = statsLoggingOperator.split(",");
+  final String operatorId = oContext.getStats().getId().toUpperCase();
+
+  for (int idx = 0; idx < operators.length; idx++) {
+// We use "contains" because the operator identifier is a composite 
string; e.g., 3:[PARQUET_ROW_GROUP_SCAN]
+if (operatorId.contains(operators[idx])) {
+  return true;
+}
+  }
+
+  return false;
+}
+  }
+
+  /**
+   * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, 
RecordBatchStatsContext)}
+   */
+  public static void logRecordBatchStats(RecordBatch recordBatch,
+RecordBatchStatsContext batchStatsContext) {
+
+logRecordBatchStats(null, recordBatch, batchStatsContext);
+  }
+
+  /**
+   * Logs record batch statistics for the input record batch (logging happens 
only
+   * when record statistics logging is enabled).
+   *
+   * @param sourceId optional source identifier for scanners
+   * @param recordBatch a set of records
+   * @param batchStatsContext batch stats context object
+   */
+  public static void logRecordBatchStats(String sourceId,
 
 Review comment:
   Leaving it as sourceId makes it a bit confusing for mid-stream operators -  
There will always be a question whether the sourceId string should be the 
name/id of the incoming record batch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement

2018-07-11 Thread GitBox
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201804507
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) {
   }
   return "NA";
 }
+
+private boolean isBatchStatsEnabledForOperator(FragmentContext context, 
OperatorContext oContext) {
+  // The configuration can select what operators should log batch 
statistics
+  final String statsLoggingOperator = 
context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase();
+  final String allOperatorsStr = "ALL";
+
+  // All operators are allowed to log batch statistics
+  if (allOperatorsStr.equals(statsLoggingOperator)) {
+return true;
+  }
+
+  // No, only a select few are allowed; syntax: 
operator-id-1,operator-id-2,..
 
 Review comment:
   As a user setting *enabled_operators*, where do I find the format in which 
the name should be specified? For example, if Project has to be enabled, should 
the name be "Project", "Project_Record_Batch", or "ProjectRecordBatch" ? Where 
does one look to find the correct string?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement

2018-07-10 Thread GitBox
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201458543
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) {
   }
   return "NA";
 }
+
+private boolean isBatchStatsEnabledForOperator(FragmentContext context, 
OperatorContext oContext) {
+  // The configuration can select what operators should log batch 
statistics
+  final String statsLoggingOperator = 
context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase();
+  final String allOperatorsStr = "ALL";
+
+  // All operators are allowed to log batch statistics
+  if (allOperatorsStr.equals(statsLoggingOperator)) {
+return true;
+  }
+
+  // No, only a select few are allowed; syntax: 
operator-id-1,operator-id-2,..
+  final String[] operators = statsLoggingOperator.split(",");
+  final String operatorId = oContext.getStats().getId().toUpperCase();
+
+  for (int idx = 0; idx < operators.length; idx++) {
+// We use "contains" because the operator identifier is a composite 
string; e.g., 3:[PARQUET_ROW_GROUP_SCAN]
+if (operatorId.contains(operators[idx])) {
+  return true;
+}
+  }
+
+  return false;
+}
+  }
+
+  /**
+   * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, 
RecordBatchStatsContext)}
+   */
+  public static void logRecordBatchStats(RecordBatch recordBatch,
 
 Review comment:
   there seem to be no callers of this function. is this meant for operators 
which don't have a sourceId ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement

2018-07-10 Thread GitBox
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201459812
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) {
   }
   return "NA";
 }
+
+private boolean isBatchStatsEnabledForOperator(FragmentContext context, 
OperatorContext oContext) {
+  // The configuration can select what operators should log batch 
statistics
+  final String statsLoggingOperator = 
context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase();
+  final String allOperatorsStr = "ALL";
+
+  // All operators are allowed to log batch statistics
+  if (allOperatorsStr.equals(statsLoggingOperator)) {
+return true;
+  }
+
+  // No, only a select few are allowed; syntax: 
operator-id-1,operator-id-2,..
+  final String[] operators = statsLoggingOperator.split(",");
+  final String operatorId = oContext.getStats().getId().toUpperCase();
+
+  for (int idx = 0; idx < operators.length; idx++) {
+// We use "contains" because the operator identifier is a composite 
string; e.g., 3:[PARQUET_ROW_GROUP_SCAN]
+if (operatorId.contains(operators[idx])) {
+  return true;
+}
+  }
+
+  return false;
+}
+  }
+
+  /**
+   * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, 
RecordBatchStatsContext)}
+   */
+  public static void logRecordBatchStats(RecordBatch recordBatch,
+RecordBatchStatsContext batchStatsContext) {
+
+logRecordBatchStats(null, recordBatch, batchStatsContext);
+  }
+
+  /**
+   * Logs record batch statistics for the input record batch (logging happens 
only
+   * when record statistics logging is enabled).
+   *
+   * @param sourceId optional source identifier for scanners
+   * @param recordBatch a set of records
+   * @param batchStatsContext batch stats context object
+   */
+  public static void logRecordBatchStats(String sourceId,
+RecordBatch recordBatch,
+RecordBatchStatsContext batchStatsContext) {
+
+if (!batchStatsContext.isEnableBatchSzLogging()) {
+  return; // NOOP
+}
+
+final String statsId = batchStatsContext.getContextOperatorId();
+final boolean verbose = batchStatsContext.isEnableFgBatchSzLogging();
+final String msg = printRecordBatchStats(statsId, sourceId, recordBatch, 
verbose);
+
+logBatchStatsMsg(batchStatsContext, msg, false);
+  }
+
+  /**
+   * Logs a generic batch statistics message
+   *
+   * @param message log message
+   * @param batchStatsLogging
+   * @param batchStatsContext batch stats context object
+   */
+  public static void logRecordBatchStats(String message,
+RecordBatchStatsContext batchStatsContext) {
+
+if (!batchStatsContext.isEnableBatchSzLogging()) {
+  return; // NOOP
+}
+
+logBatchStatsMsg(batchStatsContext, message, true);
+  }
+
+  /**
+   * Prints a materialized field type
+   * @param field materialized field
+   * @param msg string builder where to append the field type
+   */
+  /*
+  public static void printFieldType(MaterializedField field, StringBuilder 
msg) {
 
 Review comment:
   commented-out function. please remove.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement

2018-07-10 Thread GitBox
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201167369
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
 ##
 @@ -704,5 +704,8 @@ public static String bootDefaultFor(String name) {
   public static final String STATS_LOGGING_FG_BATCH_SIZE_OPTION = 
"drill.exec.stats.logging.fine_grained.batch_size";
   public static final BooleanValidator STATS_LOGGING_BATCH_FG_SIZE_VALIDATOR = 
new BooleanValidator(STATS_LOGGING_FG_BATCH_SIZE_OPTION);
 
+  /** Controls the list of operators for which batch sizing stats should be 
enabled */
 
 Review comment:
   Can you please explain the motivation for the naming hierarchy that you have 
chosen for this option ? I would suggest 
"drill.exec.stats.logging.batch_size.enabled_operators"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement

2018-07-10 Thread GitBox
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201441390
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) {
   }
   return "NA";
 }
+
+private boolean isBatchStatsEnabledForOperator(FragmentContext context, 
OperatorContext oContext) {
+  // The configuration can select what operators should log batch 
statistics
+  final String statsLoggingOperator = 
context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase();
+  final String allOperatorsStr = "ALL";
+
+  // All operators are allowed to log batch statistics
+  if (allOperatorsStr.equals(statsLoggingOperator)) {
+return true;
+  }
+
+  // No, only a select few are allowed; syntax: 
operator-id-1,operator-id-2,..
 
 Review comment:
   are the operator-ids in statsLoggingOperator  supposed to have "_"s in them 
? Otherwise, looks they will not match with something like 
"3:[PARQUET_ROW_GROUP_SCAN]"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement

2018-07-10 Thread GitBox
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201465077
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java
 ##
 @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) {
   }
   return "NA";
 }
+
+private boolean isBatchStatsEnabledForOperator(FragmentContext context, 
OperatorContext oContext) {
+  // The configuration can select what operators should log batch 
statistics
+  final String statsLoggingOperator = 
context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase();
+  final String allOperatorsStr = "ALL";
+
+  // All operators are allowed to log batch statistics
+  if (allOperatorsStr.equals(statsLoggingOperator)) {
+return true;
+  }
+
+  // No, only a select few are allowed; syntax: 
operator-id-1,operator-id-2,..
+  final String[] operators = statsLoggingOperator.split(",");
+  final String operatorId = oContext.getStats().getId().toUpperCase();
+
+  for (int idx = 0; idx < operators.length; idx++) {
+// We use "contains" because the operator identifier is a composite 
string; e.g., 3:[PARQUET_ROW_GROUP_SCAN]
+if (operatorId.contains(operators[idx])) {
+  return true;
+}
+  }
+
+  return false;
+}
+  }
+
+  /**
+   * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, 
RecordBatchStatsContext)}
+   */
+  public static void logRecordBatchStats(RecordBatch recordBatch,
+RecordBatchStatsContext batchStatsContext) {
+
+logRecordBatchStats(null, recordBatch, batchStatsContext);
+  }
+
+  /**
+   * Logs record batch statistics for the input record batch (logging happens 
only
+   * when record statistics logging is enabled).
+   *
+   * @param sourceId optional source identifier for scanners
+   * @param recordBatch a set of records
+   * @param batchStatsContext batch stats context object
+   */
+  public static void logRecordBatchStats(String sourceId,
 
 Review comment:
   Can 'sourceId' be renamed to scanSourceId ? 
   
   If an operator wants to print both incoming and outgoing batches, how can 
that be done in this logging framework in a way where you can distinguish 
between the both?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement

2018-07-10 Thread GitBox
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced 
the batch statistics logging enablement
URL: https://github.com/apache/drill/pull/1355#discussion_r201186631
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VarLenBinaryReader.java
 ##
 @@ -34,11 +34,12 @@
 import 
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.FieldOverflowState;
 import 
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.FieldOverflowStateContainer;
 import 
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.VarLenColumnBatchStats;
+import org.apache.drill.exec.util.record.RecordBatchStats;
 import org.apache.drill.exec.vector.ValueVector;
 
 /** Class which handles reading a batch of rows from a set of variable columns 
*/
 public class VarLenBinaryReader {
-  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(VarLenBinaryReader.class);
+//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(VarLenBinaryReader.class);
 
 Review comment:
   please remove commented line


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services