[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement URL: https://github.com/apache/drill/pull/1355#discussion_r201807628 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) { } return "NA"; } + +private boolean isBatchStatsEnabledForOperator(FragmentContext context, OperatorContext oContext) { + // The configuration can select what operators should log batch statistics + final String statsLoggingOperator = context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase(); + final String allOperatorsStr = "ALL"; + + // All operators are allowed to log batch statistics + if (allOperatorsStr.equals(statsLoggingOperator)) { +return true; + } + + // No, only a select few are allowed; syntax: operator-id-1,operator-id-2,.. + final String[] operators = statsLoggingOperator.split(","); + final String operatorId = oContext.getStats().getId().toUpperCase(); + + for (int idx = 0; idx < operators.length; idx++) { +// We use "contains" because the operator identifier is a composite string; e.g., 3:[PARQUET_ROW_GROUP_SCAN] +if (operatorId.contains(operators[idx])) { + return true; +} + } + + return false; +} + } + + /** + * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, RecordBatchStatsContext)} + */ + public static void logRecordBatchStats(RecordBatch recordBatch, +RecordBatchStatsContext batchStatsContext) { + +logRecordBatchStats(null, recordBatch, batchStatsContext); + } + + /** + * Logs record batch statistics for the input record batch (logging happens only + * when record statistics logging is enabled). + * + * @param sourceId optional source identifier for scanners + * @param recordBatch a set of records + * @param batchStatsContext batch stats context object + */ + public static void logRecordBatchStats(String sourceId, Review comment: Also, in a mid-stream operator, how to do I print stats in a way that will allow one to distinguish between the incoming batch stats and the outgoing batch stats in the log file ? Should there be a *String msg* parameter to allow the caller to tag the output ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement URL: https://github.com/apache/drill/pull/1355#discussion_r201806499 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) { } return "NA"; } + +private boolean isBatchStatsEnabledForOperator(FragmentContext context, OperatorContext oContext) { + // The configuration can select what operators should log batch statistics + final String statsLoggingOperator = context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase(); + final String allOperatorsStr = "ALL"; + + // All operators are allowed to log batch statistics + if (allOperatorsStr.equals(statsLoggingOperator)) { +return true; + } + + // No, only a select few are allowed; syntax: operator-id-1,operator-id-2,.. + final String[] operators = statsLoggingOperator.split(","); + final String operatorId = oContext.getStats().getId().toUpperCase(); + + for (int idx = 0; idx < operators.length; idx++) { +// We use "contains" because the operator identifier is a composite string; e.g., 3:[PARQUET_ROW_GROUP_SCAN] +if (operatorId.contains(operators[idx])) { + return true; +} + } + + return false; +} + } + + /** + * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, RecordBatchStatsContext)} + */ + public static void logRecordBatchStats(RecordBatch recordBatch, +RecordBatchStatsContext batchStatsContext) { + +logRecordBatchStats(null, recordBatch, batchStatsContext); + } + + /** + * Logs record batch statistics for the input record batch (logging happens only + * when record statistics logging is enabled). + * + * @param sourceId optional source identifier for scanners + * @param recordBatch a set of records + * @param batchStatsContext batch stats context object + */ + public static void logRecordBatchStats(String sourceId, Review comment: Leaving it as sourceId makes it a bit confusing for mid-stream operators - There will always be a question whether the sourceId string should be the name/id of the incoming record batch. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement URL: https://github.com/apache/drill/pull/1355#discussion_r201804507 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) { } return "NA"; } + +private boolean isBatchStatsEnabledForOperator(FragmentContext context, OperatorContext oContext) { + // The configuration can select what operators should log batch statistics + final String statsLoggingOperator = context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase(); + final String allOperatorsStr = "ALL"; + + // All operators are allowed to log batch statistics + if (allOperatorsStr.equals(statsLoggingOperator)) { +return true; + } + + // No, only a select few are allowed; syntax: operator-id-1,operator-id-2,.. Review comment: As a user setting *enabled_operators*, where do I find the format in which the name should be specified? For example, if Project has to be enabled, should the name be "Project", "Project_Record_Batch", or "ProjectRecordBatch" ? Where does one look to find the correct string? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement URL: https://github.com/apache/drill/pull/1355#discussion_r201458543 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) { } return "NA"; } + +private boolean isBatchStatsEnabledForOperator(FragmentContext context, OperatorContext oContext) { + // The configuration can select what operators should log batch statistics + final String statsLoggingOperator = context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase(); + final String allOperatorsStr = "ALL"; + + // All operators are allowed to log batch statistics + if (allOperatorsStr.equals(statsLoggingOperator)) { +return true; + } + + // No, only a select few are allowed; syntax: operator-id-1,operator-id-2,.. + final String[] operators = statsLoggingOperator.split(","); + final String operatorId = oContext.getStats().getId().toUpperCase(); + + for (int idx = 0; idx < operators.length; idx++) { +// We use "contains" because the operator identifier is a composite string; e.g., 3:[PARQUET_ROW_GROUP_SCAN] +if (operatorId.contains(operators[idx])) { + return true; +} + } + + return false; +} + } + + /** + * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, RecordBatchStatsContext)} + */ + public static void logRecordBatchStats(RecordBatch recordBatch, Review comment: there seem to be no callers of this function. is this meant for operators which don't have a sourceId ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement URL: https://github.com/apache/drill/pull/1355#discussion_r201459812 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) { } return "NA"; } + +private boolean isBatchStatsEnabledForOperator(FragmentContext context, OperatorContext oContext) { + // The configuration can select what operators should log batch statistics + final String statsLoggingOperator = context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase(); + final String allOperatorsStr = "ALL"; + + // All operators are allowed to log batch statistics + if (allOperatorsStr.equals(statsLoggingOperator)) { +return true; + } + + // No, only a select few are allowed; syntax: operator-id-1,operator-id-2,.. + final String[] operators = statsLoggingOperator.split(","); + final String operatorId = oContext.getStats().getId().toUpperCase(); + + for (int idx = 0; idx < operators.length; idx++) { +// We use "contains" because the operator identifier is a composite string; e.g., 3:[PARQUET_ROW_GROUP_SCAN] +if (operatorId.contains(operators[idx])) { + return true; +} + } + + return false; +} + } + + /** + * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, RecordBatchStatsContext)} + */ + public static void logRecordBatchStats(RecordBatch recordBatch, +RecordBatchStatsContext batchStatsContext) { + +logRecordBatchStats(null, recordBatch, batchStatsContext); + } + + /** + * Logs record batch statistics for the input record batch (logging happens only + * when record statistics logging is enabled). + * + * @param sourceId optional source identifier for scanners + * @param recordBatch a set of records + * @param batchStatsContext batch stats context object + */ + public static void logRecordBatchStats(String sourceId, +RecordBatch recordBatch, +RecordBatchStatsContext batchStatsContext) { + +if (!batchStatsContext.isEnableBatchSzLogging()) { + return; // NOOP +} + +final String statsId = batchStatsContext.getContextOperatorId(); +final boolean verbose = batchStatsContext.isEnableFgBatchSzLogging(); +final String msg = printRecordBatchStats(statsId, sourceId, recordBatch, verbose); + +logBatchStatsMsg(batchStatsContext, msg, false); + } + + /** + * Logs a generic batch statistics message + * + * @param message log message + * @param batchStatsLogging + * @param batchStatsContext batch stats context object + */ + public static void logRecordBatchStats(String message, +RecordBatchStatsContext batchStatsContext) { + +if (!batchStatsContext.isEnableBatchSzLogging()) { + return; // NOOP +} + +logBatchStatsMsg(batchStatsContext, message, true); + } + + /** + * Prints a materialized field type + * @param field materialized field + * @param msg string builder where to append the field type + */ + /* + public static void printFieldType(MaterializedField field, StringBuilder msg) { Review comment: commented-out function. please remove. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement URL: https://github.com/apache/drill/pull/1355#discussion_r201167369 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ## @@ -704,5 +704,8 @@ public static String bootDefaultFor(String name) { public static final String STATS_LOGGING_FG_BATCH_SIZE_OPTION = "drill.exec.stats.logging.fine_grained.batch_size"; public static final BooleanValidator STATS_LOGGING_BATCH_FG_SIZE_VALIDATOR = new BooleanValidator(STATS_LOGGING_FG_BATCH_SIZE_OPTION); + /** Controls the list of operators for which batch sizing stats should be enabled */ Review comment: Can you please explain the motivation for the naming hierarchy that you have chosen for this option ? I would suggest "drill.exec.stats.logging.batch_size.enabled_operators" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement URL: https://github.com/apache/drill/pull/1355#discussion_r201441390 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) { } return "NA"; } + +private boolean isBatchStatsEnabledForOperator(FragmentContext context, OperatorContext oContext) { + // The configuration can select what operators should log batch statistics + final String statsLoggingOperator = context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase(); + final String allOperatorsStr = "ALL"; + + // All operators are allowed to log batch statistics + if (allOperatorsStr.equals(statsLoggingOperator)) { +return true; + } + + // No, only a select few are allowed; syntax: operator-id-1,operator-id-2,.. Review comment: are the operator-ids in statsLoggingOperator supposed to have "_"s in them ? Otherwise, looks they will not match with something like "3:[PARQUET_ROW_GROUP_SCAN]" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement URL: https://github.com/apache/drill/pull/1355#discussion_r201465077 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/record/RecordBatchStats.java ## @@ -100,6 +108,119 @@ private String getQueryId(FragmentContext _context) { } return "NA"; } + +private boolean isBatchStatsEnabledForOperator(FragmentContext context, OperatorContext oContext) { + // The configuration can select what operators should log batch statistics + final String statsLoggingOperator = context.getOptions().getString(ExecConstants.STATS_LOGGING_BATCH_OPERATOR_OPTION).toUpperCase(); + final String allOperatorsStr = "ALL"; + + // All operators are allowed to log batch statistics + if (allOperatorsStr.equals(statsLoggingOperator)) { +return true; + } + + // No, only a select few are allowed; syntax: operator-id-1,operator-id-2,.. + final String[] operators = statsLoggingOperator.split(","); + final String operatorId = oContext.getStats().getId().toUpperCase(); + + for (int idx = 0; idx < operators.length; idx++) { +// We use "contains" because the operator identifier is a composite string; e.g., 3:[PARQUET_ROW_GROUP_SCAN] +if (operatorId.contains(operators[idx])) { + return true; +} + } + + return false; +} + } + + /** + * @see {@link RecordBatchStats#logRecordBatchStats(String, RecordBatch, RecordBatchStatsContext)} + */ + public static void logRecordBatchStats(RecordBatch recordBatch, +RecordBatchStatsContext batchStatsContext) { + +logRecordBatchStats(null, recordBatch, batchStatsContext); + } + + /** + * Logs record batch statistics for the input record batch (logging happens only + * when record statistics logging is enabled). + * + * @param sourceId optional source identifier for scanners + * @param recordBatch a set of records + * @param batchStatsContext batch stats context object + */ + public static void logRecordBatchStats(String sourceId, Review comment: Can 'sourceId' be renamed to scanSourceId ? If an operator wants to print both incoming and outgoing batches, how can that be done in this logging framework in a way where you can distinguish between the both? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement
bitblender commented on a change in pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement URL: https://github.com/apache/drill/pull/1355#discussion_r201186631 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VarLenBinaryReader.java ## @@ -34,11 +34,12 @@ import org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.FieldOverflowState; import org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.FieldOverflowStateContainer; import org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager.VarLenColumnBatchStats; +import org.apache.drill.exec.util.record.RecordBatchStats; import org.apache.drill.exec.vector.ValueVector; /** Class which handles reading a batch of rows from a set of variable columns */ public class VarLenBinaryReader { - private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(VarLenBinaryReader.class); +// private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(VarLenBinaryReader.class); Review comment: please remove commented line This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services