[jira] [Commented] (IMPALA-8825) Add additional counters to PlanRootSink
[ https://issues.apache.org/jira/browse/IMPALA-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930806#comment-16930806 ] ASF subversion and git services commented on IMPALA-8825: - Commit 34d132c513bfe5dc46478d9cb780a93200301b91 in impala's branch refs/heads/master from Sahil Takiar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=34d132c ] IMPALA-8825: Add additional counters to PlanRootSink Adds the counters RowsSent and RowsSentRate to the PLAN_ROOT_SINK section of the profile: PLAN_ROOT_SINK: - PeakMemoryUsage: 4.01 MB (4202496) - RowBatchGetWaitTime: 0.000ns - RowBatchSendWaitTime: 0.000ns - RowsSent: 10 (10) - RowsSentRate: 416.00 /sec RowsSent tracks the number of rows sent to the PlanRootSink via PlanRootSink::Send. RowsSentRate tracks the rate that rows are sent to the PlanRootSink. Adds the counters NumRowsFetched, NumRowsFetchedFromCache, and RowMaterializationRate to the ImpalaServer section of the profile. ImpalaServer: - ClientFetchWaitTimer: 11.999ms - NumRowsFetched: 10 (10) - NumRowsFetchedFromCache: 10 (10) - RowMaterializationRate: 9.00 /sec - RowMaterializationTimer: 1s007ms NumRowsFetched tracks the total number of rows fetched by the query, but does not include rows fetched from the cache. NumRowsFetchedFromCache tracks the total number of rows fetched from the query results cache. RowMaterializationRate tracks the rate at which rows are materialized. RowMaterializationTimer already existed and tracks how much time is spent materializing rows. Testing: * Added tests to test_fetch_first.py and query_test/test_fetch.py * Enabled some tests in test_fetch_first.py that were pending the completion of IMPALA-8819 * Ran core tests Change-Id: Id9e101e2f3e2bf8324e149c780d35825ceecc036 Reviewed-on: http://gerrit.cloudera.org:8080/14180 Tested-by: Impala Public Jenkins Reviewed-by: Sahil Takiar > Add additional counters to PlanRootSink > --- > > Key: IMPALA-8825 > URL: https://issues.apache.org/jira/browse/IMPALA-8825 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not > contain much useful information: > {code:java} > PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%) > - PeakMemoryUsage: 0{code} > There are several additional counters we could add to the {{PlanRootSink}} > (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}): > * Amount of time spent blocking inside the {{PlanRootSink}} - both the time > spent by the client thread waiting for rows to become available and the time > spent by the impala thread waiting for the client to consume rows > ** So similar to the {{RowBatchQueueGetWaitTime}} and > {{RowBatchQueuePutWaitTime}} inside the scan nodes > ** The difference between these counters and the ones in > {{ClientRequestState}} (e.g. {{ClientFetchWaitTimer}} and > {{RowMaterializationTimer}}) should be documented > * For {{BufferedPlanRootSink}} there are already several {{Buffer pool}} > counters, we should make sure they are exposed in the {{PLAN_ROOT_SINK}} > section > * Track the number of rows sent (e.g. rows sent to {{PlanRootSink::Send}} > and the number of rows fetched (might need to be tracked in the > {{ClientRequestState}}) > ** For {{BlockingPlanRootSink}} the sent and fetched values should be pretty > much the same, but for {{BufferedPlanRootSink}} this is more useful > ** Similar to {{RowsReturned}} in each exec node > * The rate at which rows are sent and fetched > ** Should be useful when attempting to debug perf of the fetching rows (e.g. > if the send rate is much higher than the fetch rate, then maybe there is > something wrong with the client) > ** Similar to {{RowsReturnedRate}} in each exec node > Open to other suggestions for counters that folks think are useful. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8825) Add additional counters to PlanRootSink
[ https://issues.apache.org/jira/browse/IMPALA-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914749#comment-16914749 ] ASF subversion and git services commented on IMPALA-8825: - Commit d037ac8304b43f6e4bb4c6ba2eb1910a9e921c24 in impala's branch refs/heads/master from Sahil Takiar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=d037ac8 ] IMPALA-8818: Replace deque with spillable queue in BufferedPRS Replaces DequeRowBatchQueue with SpillableRowBatchQueue in BufferedPlanRootSink. A few changes to BufferedPlanRootSink were necessary for it to work with the spillable queue, however, all the synchronization logic is the same. SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and a ReservationManager. It takes in a TBackendResourceProfile that specifies the max / min memory reservation the BufferedTupleStream can use to buffer rows. The 'max_unpinned_bytes' parameter limits the max number of bytes that can be unpinned in the BufferedTupleStream. The limit is a 'soft' limit because calls to AddBatch may push the amount of unpinned memory over the limit. The queue is non-blocking and not thread safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill if the BufferedTupleStream does not have enough reservation to fit the entire RowBatch. Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned and unpinned memory that a query can use for spooling, respectively. MAX_PINNED_RESULT_SPOOLING_MEMORY must be <= MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned data in the BufferedTupleStream to be unpinned. This is enforced in a new method in QueryOptions called 'ValidateQueryOptions'. Planner Changes: PlanRootSink.java now computes a full ResourceProfile if result spooling is enabled. The min mem reservation is bounded by the size of the read and write pages used by the BufferedTupleStream. The max mem reservation is bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is computed by estimating the size of the result set using stats. BufferedTupleStream Re-Factoring: For the most part, using a BufferedTupleStream outside an ExecNode works properly. However, some changes were necessary: * The message for the MAX_ROW_SIZE error is ExecNode specific. In order to fix this, this patch introduces the concept of an ExecNode 'label' which is a more generic version of an ExecNode 'id'. * The definition of TBackendResourceProfile lived in PlanNodes.thrift, it was moved to its own file so it can be used by DataSinks.thrift. * Modified BufferedTupleStream so it internally tracks how many bytes are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY'). Metrics: * Added a few of the metrics mentioned in IMPALA-8825 to BufferedPlanRootSink. Specifically, added timers to track how much time is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext' methods. * The BufferedTupleStream in the SpillableRowBatchQueue exposes several BufferPool metrics such as number of reserved and unpinned bytes. Bug Fixes: * Fixed a bug in BufferedPlanRootSink where the MemPool used by the expression evaluators was not being cleared incrementally. * Fixed a bug where the inactive timer was not being properly updated in BufferedPlanRootSink. * Fixed a bug where RowBatch memory was not freed if BufferedPlanRootSink::GetNext terminated early because it could not handle requests where num_results < BATCH_SIZE. Testing: * Added new tests to test_result_spooling.py. * Updated errors thrown in spilling-large-rows.test. * Ran exhaustive tests. Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9 Reviewed-on: http://gerrit.cloudera.org:8080/14039 Reviewed-by: Sahil Takiar Tested-by: Impala Public Jenkins > Add additional counters to PlanRootSink > --- > > Key: IMPALA-8825 > URL: https://issues.apache.org/jira/browse/IMPALA-8825 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not > contain much useful information: > {code:java} > PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%) > - PeakMemoryUsage: 0{code} > There are several additional counters we could add to the {{PlanRootSink}} > (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}): > * Amount of time spent blocking inside the {{PlanRootSink}} - both the time > spent by the client thread waiting for rows to become available and the time > spent by the impala thread waiting for the client to consume rows > ** So similar to the {{RowBatchQueueGetWaitTime}} and > {{RowBatchQueuePutWaitTime}} inside the
[jira] [Commented] (IMPALA-8825) Add additional counters to PlanRootSink
[ https://issues.apache.org/jira/browse/IMPALA-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899020#comment-16899020 ] Sahil Takiar commented on IMPALA-8825: -- Would be nice to have IMPALA-7551 fixed as part of this as well, so linking the two. > Add additional counters to PlanRootSink > --- > > Key: IMPALA-8825 > URL: https://issues.apache.org/jira/browse/IMPALA-8825 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not > contain much useful information: > {code:java} > PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%) > - PeakMemoryUsage: 0{code} > There are several additional counters we could add to the {{PlanRootSink}} > (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}): > * Amount of time spent blocking inside the {{PlanRootSink}} - both the time > spent by the client thread waiting for rows to become available and the time > spent by the impala thread waiting for the client to consume rows > ** So similar to the {{RowBatchQueueGetWaitTime}} and > {{RowBatchQueuePutWaitTime}} inside the scan nodes > ** The difference between these counters and the ones in > {{ClientRequestState}} (e.g. {{ClientFetchWaitTimer}} and > {{RowMaterializationTimer}}) should be documented > * For {{BufferedPlanRootSink}} there are already several {{Buffer pool}} > counters, we should make sure they are exposed in the {{PLAN_ROOT_SINK}} > section > * Track the number of rows sent (e.g. rows sent to {{PlanRootSink::Send}} > and the number of rows fetched (might need to be tracked in the > {{ClientRequestState}}) > ** For {{BlockingPlanRootSink}} the sent and fetched values should be pretty > much the same, but for {{BufferedPlanRootSink}} this is more useful > ** Similar to {{RowsReturned}} in each exec node > * The rate at which rows are sent and fetched > ** Should be useful when attempting to debug perf of the fetching rows (e.g. > if the send rate is much higher than the fetch rate, then maybe there is > something wrong with the client) > ** Similar to {{RowsReturnedRate}} in each exec node > Open to other suggestions for counters that folks think are useful. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org