[jira] [Comment Edited] (CASSANDRA-6572) Workload recording / playback

2014-07-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071759#comment-14071759
 ] 

Benedict edited comment on CASSANDRA-6572 at 7/23/14 2:13 PM:
--

It looks to me like you need some way to share the statement preparation across 
threads, as it can be used by any thread (and across log segments) once 
prepared. Probably easiest to do it during parsing of the log file. 

We also have an issue with replay potentially over-parallelizing, and also 
potentially OOMing, as you're submitting straight to a thread pool after 
parsing each file. So there's nothing stopping us racing ahead and reading all 
of the log files (you have an unbounded queue), but since you submit each file 
separately you will spawn a thread/executor for each thread/segment 
combination, rather than each thread.

Probably we want to create some separate state to represent a thread, which we 
create once the first time we see a thread id, insert it into a map, and then 
place work directly onto this queue during parsing of all segments. We can 
submit a runnable immediately for processing this queue to represent a thread. 
We have a potential problem here, though, which is that we do not know if a 
thread died, so we can fill up the executor pool, so we for now let's use an 
unbounded executorpool and leave tackling this properly until we have 
everything else in place. We should then limit how many queries ahead we can 
read to prevent OOM. 

Also, we're still replaying based on _offset_ from last query, which means we 
will skew very quickly. We should be fixing an epoch (in nanos) such that you 
have a log epoch of L, and queries are run at T=L+X; when re-run we have a 
replay epoch of R, and we run queries at R+X


was (Author: benedict):
It looks to me like you need some way to share the statement preparation across 
threads, as it can be used by any thread (and across log segments) once 
prepared. Probably easiest to do it during parsing of the log file. 

We also have an issue with replay potentially over-parallelizing, and also 
potentially OOMing, as you're submitting straight to a thread pool after 
parsing each file. So there's nothing stopping us racing ahead and reading all 
of the log files (you have an unbounded queue), but since you submit each file 
separately you will spawn a thread/executor for each thread/segment 
combination, rather than each thread.

Probably we want to create some separate state to represent a thread, which we 
create once the first time we see a thread id, insert it into a map, and then 
place work directly onto this queue during parsing of all segments. We can 
submit a runnable immediately for processing this queue to represent a thread. 
We have a potential problem here, though, which is that we do not know if a 
thread died, so we can fill up the executor pool, so we for now let's use an 
unbounded executorpool and leave tackling this properly until we have 
everything else in place. We should then limit how many queries ahead we can 
read to prevent OOM. 

Also, we're still replaying based on _offset_ from last query, which means we 
will skew very quickly. We should be fixing an epoch (in nanos) when writing 
the log, and on replay, and we ensure that each query is replayed at a time as 
close to its offset during replay as it was offset from the log epoch when 
first run (i.e. you have a log epoch of L, and queries are run at T=L+X; when 
re-run we have a replay epoch of R, and we run queries at R+X)

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.1.1

 Attachments: 6572-trunk.diff


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6572) Workload recording / playback

2014-05-22 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006452#comment-14006452
 ] 

Lyuben Todorov edited comment on CASSANDRA-6572 at 5/22/14 8:52 PM:


Commit 
[3bd3ddc|https://github.com/lyubent/cassandra/commit/3bd3ddc5f2b2068895f57a753b2ca2c7431c07b8]:
# {{runFlush()}} is used in SS to force a flush should the query recording be 
disabled \/ cassandra gets shut down, but I renamed it to {{forceFlush}} for 
clarity. 
# added an interface that gives us access to a statement's keyspace from 
CFMetaData. {{isSystemOrTraceKS}} now uses the keyspace from the CFMetaData\. 
Batch statements are special-cased as they contain more than just a single 
keyspace (might possibly change this further along once we get to replaying of 
batches). (separate commit 
[here|https://github.com/lyubent/cassandra/commit/1b10a60de35c2f5b69f8100edc551c93d4bbe027])
 
# Updated QR to recycle QQs which get stored onto the CLQ. 
# Added a small optimisation, BBs used for creating the data array in 
{{QR#allocate(String)}} are now recycled where the largest buffer created is 
stored, when a bigger one is required it gets allocated and replaces the older 
one.
# Added forceFlush to the shutdown process.

P.S. The more this develops, the more it seems like QR should be a singleton, I 
think it wont be a big change at all to modify it to be one, not that the 
current design is a problem, just an idea to consider. 


was (Author: lyubent):
Commit 
[3bd3ddc|https://github.com/lyubent/cassandra/commit/3bd3ddc5f2b2068895f57a753b2ca2c7431c07b8]:
# {{runFlush()}} is used in SS to force a flush should the query recording be 
disabled / cassandra gets shut down, but I renamed it to {{forceFlush}} for 
clarity. 
# added an interface that gives us access to a statement's keyspace from 
CFMetaData. {{isSystemOrTraceKS}} now uses the keyspace from the CFMetaData, 
{{BatchStatement}}s are special-cased as they contain more than just a single 
keyspace (might possibly change this further along once we get to replaying of 
batches). 
# Updated QR to recycle QQs which get stored onto the CLQ. 
# Added a small optimisation, BBs used for creating the data array in 
{{QR#allocate(String)}} are now recycled where the largest buffer created is 
stored, when a bigger one is required it gets allocated and replaces the older 
one.
# Added forceFlush to the shutdown process.

P.S. The more this develops, the more it seems like QR should be a singleton, I 
think it wont be a big change at all to modify it to be one, not that the 
current design is a problem, just an idea to consider. 

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.1.1

 Attachments: 6572-trunk.diff


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6572) Workload recording / playback

2014-05-20 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004097#comment-14004097
 ] 

Lyuben Todorov edited comment on CASSANDRA-6572 at 5/20/14 10:58 PM:
-

bq. put any finished flushing QQs onto this CLQ when done so that we quickly 
reach an equilibrium point. 

Why do we want to store completed QQs onto the CLQ used for storing failed 
pre-flush CAS? Also what would be the best time to poll the CQL of failed flush 
QQs? I'd say on CAS success of the current queue submit multiple flushes in 
separate threads until the CLQ of failed QQs is emptied out.

Other than that commit with everything else in the comment on 
[c38080f|https://github.com/lyubent/cassandra/commit/c38080fd5e852901db903f59fff450e99f93d358].


was (Author: lyubent):
bq. put any finished flushing QQs onto this CLQ when done so that we quickly 
reach an equilibrium point. 

Why do we want to store completed QQs onto the CLQ used for storing failed 
pre-flush CAS? Also what would be the best time to poll the CQL of failed flush 
QQs? I'd say on CAS success of the current queue submit multiple flushes in 
separate threads until the CLQ of failed QQs is emptied out.

Other than that commit with everything else in the comment on 
[54e579b|https://github.com/lyubent/cassandra/commit/c38080fd5e852901db903f59fff450e99f93d358].

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.1.1

 Attachments: 6572-trunk.diff


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6572) Workload recording / playback

2014-05-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993673#comment-13993673
 ] 

Benedict edited comment on CASSANDRA-6572 at 5/9/14 4:41 PM:
-

Points 1 and 2 need to be atomic, so you calculate how much room you need, then 
in a loop you get the current value of the queue pointer, check there is room 
in the queue for the record, and then cas the current value to the new value 
(old + size). If it succeeds you exit the loop and then you can write to the 
buffer. No need to offload to another thread. The flush needs to make sure that 
all prior appends have finished before flushing though, and the best way to do 
this is with an OpOrder. See CommitLogSegment for examples. The appends would 
do something like:

{code}
OpOrder.Group opGroup = order.start();
try
{
 int size = calcSize();
 QueryQueue q; 
 int position;
 while (true)
 {
  q = currentQueue
  int position = q.currentPosition
  if (position + size  q.buffer.length  
q.currentPosition.compareAndSet(position, position + size))
   break;
 }
 doAppend(q, position);
} 
finally
{
 opGroup.close()
}
{code}

and the flush would do something like

{code}
OpOrder.Barrier barrier = order.newBarrier();
barrier.issue();
barrier.await();
doFlush();
{code}


was (Author: benedict):
Points 1 and 2 need to be atomic, so you calculate how much room you need, then 
in a loop you get the current value of the queue pointer, check there is room 
in the queue for the record, and then cas the current value to the new value 
(old + size). If it succeeds you exit the loop and then you can write to the 
buffer. No need to offload to another thread. The flush needs to make sure that 
all prior appends have finished before flushing though, and the best way to do 
this is with an OpOrder. See CommitLogSegment for examples. The appends would 
do something like:

{code}
OpOrder.Group opGroup = order.start();
try
{
 int size = calcSize();
 QueryQueue q; 
 int position;
 while (true)
 {
  int position = q.currentPosition
  if (position + size  q.buffer.length  
q.currentPosition.compareAndSet(position, position + size))
   break;
 }
 doAppend(q, position);
} 
finally
{
 opGroup.close()
}
{code}

and the flush would do something like

{code}
OpOrder.Barrier barrier = order.newBarrier();
barrier.issue();
barrier.await();
doFlush();
{code}

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.1.1

 Attachments: 6572-trunk.diff


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6572) Workload recording / playback

2014-05-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986591#comment-13986591
 ] 

Benedict edited comment on CASSANDRA-6572 at 5/1/14 1:57 PM:
-

Some comments on the in progress patch:

* Don't create a string with the header and convert it to bytes - convert the 
string to bytes and write a normal byte-encoded header with timestamp + length 
as longs. This will make encoding the prepared statement parameters much easier 
also
* Encapsulate queryQue and logPosition into a single object, and use an 
atomicinteger for the position - don't synchronise, just bump the position 
however much you need, then write to the owned range. On flush swap the object 
(use an AtomicReference to track the current buffer)
* On flush, append directly from the byte buffer, don't copy it. Create a 
FileOutputStream and call its appropriate write method with the range that is 
in use
* On the read path, you're now eagerly reading _all_ files which is likely to 
blow up the heap; at least create an Iterator that only reads a whole file at 
once (preferably read a chunk of a file at a time, with a BufferedInputStream)
* On replay timing we want to target hitting the same delta from epoch for 
running the query, not the delta from the prior query - this should help 
prevent massive timing drifts
* Query frequency can be an int rather than an Integer to avoid unboxing
* I think it would be nice if we checked the actual CFMetaData for the 
keyspaces we're modifying in the CQLStatement, rather than doing a find within 
the whole string, but it's not too big a deal
* atomicCounterLock needs to be removed
* As a general rule, never copy array contents with a loop - always use 
System.arraycopy
* Still need to log the thread + session id as Jonathan mentioned




was (Author: benedict):
Some comments on the in progress patch:

* Don't create a string with the header and convert it to bytes - convert the 
string to bytes and write a normal byte-encoded header with timestamp + length 
as a long. This will make encoding the prepared statement parameters much 
easier also
* Encapsulate queryQue and logPosition into a single object, and use an 
atomicinteger for the position - don't synchronise, just bump the position 
however much you need, then write to the owned range. On flush swap the object 
(use an AtomicReference to track the current buffer)
* On flush, append directly from the byte buffer, don't copy it. Create a 
FileOutputStream and call its appropriate write method with the range that is 
in use
* On the read path, you're now eagerly reading _all_ files which is likely to 
blow up the heap; at least create an Iterator that only reads a whole file at 
once (preferably read a chunk of a file at a time, with a BufferedInputStream)
* On replay timing we want to target hitting the same delta from epoch for 
running the query, not the delta from the prior query - this should help 
prevent massive timing drifts
* Query frequency can be an int rather than an Integer to avoid unboxing
* I think it would be nice if we checked the actual CFMetaData for the 
keyspaces we're modifying in the CQLStatement, rather than doing a find within 
the whole string, but it's not too big a deal
* atomicCounterLock needs to be removed
* As a general rule, never copy array contents with a loop - always use 
System.arraycopy
* Still need to log the thread + query id as Jonathan mentioned



 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.1.1

 Attachments: 6572-trunk.diff


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6572) Workload recording / playback

2014-04-30 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985649#comment-13985649
 ] 

Lyuben Todorov edited comment on CASSANDRA-6572 at 4/30/14 3:55 PM:


Since queries are logged as their string form, this doesn't accommodate 
prepared statements. One way I think we can log the ps' is to log the string 
query during the prepare phase along with the query's id, e.g. 
{{b7693b50da63a31229b8413754bc72c0 INSERT INTO ks.cf (col1) VALUES ( \? )}} and 
then in {{ExecuteMessage#execute}} the values and the id can be logged again 
later on in the log, then during replay we can match values to the queryString 
by using the id. A better way to do it would be to get access to the 
statementId in QP#executePrepared but I'm not sure whether it's worth changing 
the statement to store it's id. 


was (Author: lyubent):
Since queries are logged as their string form, this doesn't accommodate 
prepared statements. One way I think we can log the ps' is to log the string 
query during the prepare phase along with the query's id, e.g. 
{{b7693b50da63a31229b8413754bc72c0 INSERT INTO ks.cf (col1) VALUES (?) }} and 
then in ExecuteMessage#execute the values and the id can be logged again later 
on in the log, then during replay we can match values to the queryString by 
using the id. A better way to do it would be to get access to the statementId 
in QP#executePrepared but I'm not sure whether it's worth changing the 
statement to store it's id. 

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.1.1

 Attachments: 6572-trunk.diff


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)