subject:"\[jira\] \[Updated\] \(CASSANDRA\-8518\) Impose In\-Flight Data Limit"

[jira] [Updated] (CASSANDRA-8518) Impose In-Flight Data Limit

2015-01-27 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-8518:

Summary: Impose In-Flight Data Limit  (was: Cassandra Query Request Size 
Estimator)

 Impose In-Flight Data Limit
 ---

 Key: CASSANDRA-8518
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8518
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Cheng Ren

 We have been suffering from cassandra node crash due to out of memory for a 
 long time. The heap dump from the recent crash shows there are 22 native 
 transport request threads each of which consumes 3.3% of heap size, taking 
 more than 70% in total.  
 Heap dump:
 !https://dl-web.dropbox.com/get/attach1.png?_subject_uid=303980955w=AAAVOoncBoZ5aOPbDg2TpRkUss7B-2wlrnhUAv19b27OUA|height=400,width=600!
 Expanded view of one thread:
 !https://dl-web.dropbox.com/get/Screen%20Shot%202014-12-18%20at%204.06.29%20PM.png?_subject_uid=303980955w=AACUO4wrbxheRUxv8fwQ9P52T6gBOm5_g9zeIe8odu3V3w|height=400,width=600!
 The cassandra we are using now (2.0.4) utilized MemoryAwareThreadPoolExecutor 
 as the request executor and provided a default request size estimator which 
 constantly returns 1, meaning it limits only the number of requests being 
 pushed to the pool. To have more fine-grained control on handling requests 
 and better protect our node from OOM issue, we propose implementing a more 
 precise estimator. 
 Here is our two cents:
 For update/delete/insert request: Size could be estimated by adding size of 
 all class members together.
 For scan query, the major part of the request is response, which can be 
 estimated from the history data. For example if we receive a scan query on a 
 column family for a certain token range, we keep track of its response size 
 used as the estimated response size for later scan query on the same cf. 
 For future requests on the same cf, response size could be calculated by 
 token range*recorded size/ recorded token range. The request size should be 
 estimated as (query size + estimated response size).
 We believe what we're proposing here can be useful for other people in the 
 Cassandra community as well. Would you mind providing us feedbacks? Please 
 let us know if you have any concerns or suggestions regarding this proposal.
 Thanks,
 Cheng



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8518) Impose In-Flight Data Limit

2015-01-27 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benedict updated CASSANDRA-8518:

Labels: performance (was: )

Impose In-Flight Data Limit
---

Key: CASSANDRA-8518
URL: https://issues.apache.org/jira/browse/CASSANDRA-8518
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Cheng Ren
Labels: performance

We have been suffering from cassandra node crash due to out of memory for a
long time. The heap dump from the recent crash shows there are 22 native
transport request threads each of which consumes 3.3% of heap size, taking
more than 70% in total.
Heap dump:
!https://dl-web.dropbox.com/get/attach1.png?_subject_uid=303980955w=AAAVOoncBoZ5aOPbDg2TpRkUss7B-2wlrnhUAv19b27OUA|height=400,width=600!
Expanded view of one thread:
!https://dl-web.dropbox.com/get/Screen%20Shot%202014-12-18%20at%204.06.29%20PM.png?_subject_uid=303980955w=AACUO4wrbxheRUxv8fwQ9P52T6gBOm5_g9zeIe8odu3V3w|height=400,width=600!
The cassandra we are using now (2.0.4) utilized MemoryAwareThreadPoolExecutor
as the request executor and provided a default request size estimator which
constantly returns 1, meaning it limits only the number of requests being
pushed to the pool. To have more fine-grained control on handling requests
and better protect our node from OOM issue, we propose implementing a more
precise estimator.
Here is our two cents:
For update/delete/insert request: Size could be estimated by adding size of
all class members together.
For scan query, the major part of the request is response, which can be
estimated from the history data. For example if we receive a scan query on a
column family for a certain token range, we keep track of its response size
used as the estimated response size for later scan query on the same cf.
For future requests on the same cf, response size could be calculated by
token range*recorded size/ recorded token range. The request size should be
estimated as (query size + estimated response size).
We believe what we're proposing here can be useful for other people in the
Cassandra community as well. Would you mind providing us feedbacks? Please
let us know if you have any concerns or suggestions regarding this proposal.
Thanks,
Cheng

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8518) Impose In-Flight Data Limit

[jira] [Updated] (CASSANDRA-8518) Impose In-Flight Data Limit

2 matches

Site Navigation

Mail list logo

Footer information