[jira] [Commented] (IMPALA-8825) Add additional counters to PlanRootSink

2019-09-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930806#comment-16930806
 ] 

ASF subversion and git services commented on IMPALA-8825:
-

Commit 34d132c513bfe5dc46478d9cb780a93200301b91 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=34d132c ]

IMPALA-8825: Add additional counters to PlanRootSink

Adds the counters RowsSent and RowsSentRate to the PLAN_ROOT_SINK
section of the profile:

  PLAN_ROOT_SINK:
 - PeakMemoryUsage: 4.01 MB (4202496)
 - RowBatchGetWaitTime: 0.000ns
 - RowBatchSendWaitTime: 0.000ns
 - RowsSent: 10 (10)
 - RowsSentRate: 416.00 /sec

RowsSent tracks the number of rows sent to the PlanRootSink via
PlanRootSink::Send. RowsSentRate tracks the rate that rows are sent to
the PlanRootSink.

Adds the counters NumRowsFetched, NumRowsFetchedFromCache, and
RowMaterializationRate to the ImpalaServer section of the profile.

  ImpalaServer:
 - ClientFetchWaitTimer: 11.999ms
 - NumRowsFetched: 10 (10)
 - NumRowsFetchedFromCache: 10 (10)
 - RowMaterializationRate: 9.00 /sec
 - RowMaterializationTimer: 1s007ms

NumRowsFetched tracks the total number of rows fetched by the query,
but does not include rows fetched from the cache. NumRowsFetchedFromCache
tracks the total number of rows fetched from the query results cache.
RowMaterializationRate tracks the rate at which rows are materialized.
RowMaterializationTimer already existed and tracks how much time is
spent materializing rows.

Testing:
* Added tests to test_fetch_first.py and query_test/test_fetch.py
* Enabled some tests in test_fetch_first.py that were pending
the completion of IMPALA-8819
* Ran core tests

Change-Id: Id9e101e2f3e2bf8324e149c780d35825ceecc036
Reviewed-on: http://gerrit.cloudera.org:8080/14180
Tested-by: Impala Public Jenkins 
Reviewed-by: Sahil Takiar 


> Add additional counters to PlanRootSink
> ---
>
> Key: IMPALA-8825
> URL: https://issues.apache.org/jira/browse/IMPALA-8825
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not 
> contain much useful information:
> {code:java}
> PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%)
> - PeakMemoryUsage: 0{code}
> There are several additional counters we could add to the {{PlanRootSink}} 
> (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}):
>  * Amount of time spent blocking inside the {{PlanRootSink}} - both the time 
> spent by the client thread waiting for rows to become available and the time 
> spent by the impala thread waiting for the client to consume rows
>  ** So similar to the {{RowBatchQueueGetWaitTime}} and 
> {{RowBatchQueuePutWaitTime}} inside the scan nodes
>  ** The difference between these counters and the ones in 
> {{ClientRequestState}} (e.g. {{ClientFetchWaitTimer}} and 
> {{RowMaterializationTimer}}) should be documented
>  * For {{BufferedPlanRootSink}} there are already several {{Buffer pool}} 
> counters, we should make sure they are exposed in the {{PLAN_ROOT_SINK}} 
> section
>  * Track the number of rows sent (e.g. rows sent to {{PlanRootSink::Send}} 
> and the number of rows fetched (might need to be tracked in the 
> {{ClientRequestState}})
>  ** For {{BlockingPlanRootSink}} the sent and fetched values should be pretty 
> much the same, but for {{BufferedPlanRootSink}} this is more useful
>  ** Similar to {{RowsReturned}} in each exec node
>  * The rate at which rows are sent and fetched
>  ** Should be useful when attempting to debug perf of the fetching rows (e.g. 
> if the send rate is much higher than the fetch rate, then maybe there is 
> something wrong with the client)
>  ** Similar to {{RowsReturnedRate}} in each exec node
> Open to other suggestions for counters that folks think are useful.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8825) Add additional counters to PlanRootSink

2019-08-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914749#comment-16914749
 ] 

ASF subversion and git services commented on IMPALA-8825:
-

Commit d037ac8304b43f6e4bb4c6ba2eb1910a9e921c24 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d037ac8 ]

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Reviewed-on: http://gerrit.cloudera.org:8080/14039
Reviewed-by: Sahil Takiar 
Tested-by: Impala Public Jenkins 


> Add additional counters to PlanRootSink
> ---
>
> Key: IMPALA-8825
> URL: https://issues.apache.org/jira/browse/IMPALA-8825
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not 
> contain much useful information:
> {code:java}
> PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%)
> - PeakMemoryUsage: 0{code}
> There are several additional counters we could add to the {{PlanRootSink}} 
> (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}):
>  * Amount of time spent blocking inside the {{PlanRootSink}} - both the time 
> spent by the client thread waiting for rows to become available and the time 
> spent by the impala thread waiting for the client to consume rows
>  ** So similar to the {{RowBatchQueueGetWaitTime}} and 
> {{RowBatchQueuePutWaitTime}} inside the 

[jira] [Commented] (IMPALA-8825) Add additional counters to PlanRootSink

2019-08-02 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899020#comment-16899020
 ] 

Sahil Takiar commented on IMPALA-8825:
--

Would be nice to have IMPALA-7551 fixed as part of this as well, so linking the 
two.

> Add additional counters to PlanRootSink
> ---
>
> Key: IMPALA-8825
> URL: https://issues.apache.org/jira/browse/IMPALA-8825
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not 
> contain much useful information:
> {code:java}
> PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%)
> - PeakMemoryUsage: 0{code}
> There are several additional counters we could add to the {{PlanRootSink}} 
> (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}):
>  * Amount of time spent blocking inside the {{PlanRootSink}} - both the time 
> spent by the client thread waiting for rows to become available and the time 
> spent by the impala thread waiting for the client to consume rows
>  ** So similar to the {{RowBatchQueueGetWaitTime}} and 
> {{RowBatchQueuePutWaitTime}} inside the scan nodes
>  ** The difference between these counters and the ones in 
> {{ClientRequestState}} (e.g. {{ClientFetchWaitTimer}} and 
> {{RowMaterializationTimer}}) should be documented
>  * For {{BufferedPlanRootSink}} there are already several {{Buffer pool}} 
> counters, we should make sure they are exposed in the {{PLAN_ROOT_SINK}} 
> section
>  * Track the number of rows sent (e.g. rows sent to {{PlanRootSink::Send}} 
> and the number of rows fetched (might need to be tracked in the 
> {{ClientRequestState}})
>  ** For {{BlockingPlanRootSink}} the sent and fetched values should be pretty 
> much the same, but for {{BufferedPlanRootSink}} this is more useful
>  ** Similar to {{RowsReturned}} in each exec node
>  * The rate at which rows are sent and fetched
>  ** Should be useful when attempting to debug perf of the fetching rows (e.g. 
> if the send rate is much higher than the fetch rate, then maybe there is 
> something wrong with the client)
>  ** Similar to {{RowsReturnedRate}} in each exec node
> Open to other suggestions for counters that folks think are useful.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org