Is there a way to turn this on and off through configuration? I am not
necessarily sure I would want this feature. Also it is confusing if these
writes show up in JMX and look like user generated write operations.


On Mon, Dec 17, 2012 at 10:01 AM, Mike <mthero...@yahoo.com> wrote:

>  Thank you Aaron, this was very helpful.
>
> Could it be an issue that this optimization does not really take effect
> until the memtable with the hoisted data is flushed?  In my simple example
> below, the same row is updated and multiple selects of the same row will
> result in multiple writes to the memtable.  It seems it maybe possible
> (although unlikely) where, if you go from a write-mostly to a read-mostly
> scenario, you could get into a state where you are stuck rewriting to the
> same memtable, and the memtable is not flushed because it absorbs the
> over-writes.  I can foresee this especially if you are reading the same
> rows repeatedly.
>
> I also noticed from the codepaths that if Row caching is enabled, this
> optimization will not occur.  We made some changes this weekend to make
> this column family more suitable to row-caching and enabled row-caching
> with a small cache.  Our initial results is that it seems to have corrected
> the write counts, and has increased performance quite a bit.  However, are
> there any hidden gotcha's there because this optimization is not
> occurring?  https://issues.apache.org/jira/browse/CASSANDRA-2503 mentions
> a "compaction is behind" problem.  Any history on that?  I couldn't find
> too much information on it.
>
> Thanks,
> -Mike
>
> On 12/16/2012 8:41 PM, aaron morton wrote:
>
>
>   1) Am I reading things correctly?
>
> Yes.
> If you do a read/slice by name and more than min compaction level nodes
> where read the data is re-written so that the next read uses fewer SSTables.
>
>  2) What is really happening here?  Essentially minor compactions can
> occur between 4 and 32 memtable flushes.  Looking through the code, this
> seems to only effect a couple types of select statements (when selecting a
> specific column on a specific key being one of them). During the time
> between these two values, every "select" statement will perform a write.
>
> Yup, only for readying a row where the column names are specified.
> Remember minor compaction when using SizedTiered Compaction (the default)
> works on buckets of the same size.
>
>  Imagine a row that had been around for a while and had fragments in more
> than Min Compaction Threshold sstables. Say it is 3 SSTables in the 2nd
> tier and 2 sstables in the 1st. So it takes (potentially) 5 SSTable reads.
> If this row is read it will get hoisted back up.
>
>  But the row has is in only 1 SSTable in the 2nd tier and 2 in the 1st
> tier it will not hoisted.
>
>  There are a few short circuits in the SliceByName read path. One of them
> is to end the search when we know that no other SSTables contain columns
> that should be considered. So if the 4 columns you read frequently are
> hoisted into the 1st bucket your reads will get handled by that one bucket.
>
>  It's not every select. Just those that touched more the min compaction
> sstables.
>
>
>  3) Is this desired behavior?  Is there something else I should be
> looking at that could be causing this behavior?
>
> Yes.
> https://issues.apache.org/jira/browse/CASSANDRA-2503
>
>  Cheers
>
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
>  @aaronmorton
> http://www.thelastpickle.com
>
>  On 15/12/2012, at 12:58 PM, Michael Theroux <mthero...@yahoo.com> wrote:
>
>  Hello,
>
>  We have an unusual situation that I believe I've reproduced, at least
> temporarily, in a test environment.  I also think I see where this issue is
> occurring in the code.
>
>  We have a specific column family that is under heavy read and write load
> on a nightly basis.   For the purposes of this description, I'll refer to
> this column family as "Bob".  During this nightly processing, sometimes Bob
> is under very write load, other times it is very heavy read load.
>
>  The application is such that when something is written to Bob, a write
> is made to one of two other tables.  We've witnessed a situation where the
> write count on Bob far outstrips the write count on either of the other
> tables, by a factor of 3->10.  This is based on the WriteCount available on
> the column family JMX MBean.  We have not been able to find where in our
> code this is happening, and we have gone as far as tracing our CQL calls to
> determine that the relationship between Bob and the other tables are what
> we expect.
>
>  I brought up a test node to experiment, and see a situation where, when
> a "select" statement is executed, a write will occur.
>
>  In my test, I perform the following (switching between nodetool and
> cqlsh):
>
>  update bob set 'about'='coworker' where key='<hex key>';
> nodetool flush
>  update bob set 'about'='coworker' where key='<hex key>';
> nodetool flush
>  update bob set 'about'='coworker' where key='<hex key>';
> nodetool flush
>  update bob set 'about'='coworker' where key='<hex key>';
> nodetool flush
>  update bob set 'about'='coworker' where key='<hex key>';
> nodetool flush
>
>  Then, for a period of time (before a minor compaction occurs), a select
> statement that selects specific columns will cause writes to occur in the
> write count of the column family:
>
>  select about,changed,data from bob where key='<hex key>';
>
>  This situation will continue until a minor compaction is completed.
>
>  I went into the code and added some traces to CollationController.java:
>
>    private ColumnFamily collectTimeOrderedData()
>     {
>         logger.debug("collectTimeOrderedData");
>
>       ... <snip> ...
>
>
> ---> HERE   logger.debug( "tables iterated: " + sstablesIterated +  " Min 
> compact: " + cfs.getMinimumCompactionThreshold() );
>
>             // "hoist up" the requested data into a more recent sstable
>             if (sstablesIterated > cfs.getMinimumCompactionThreshold()
>                 && !cfs.isCompactionDisabled()
>                 && cfs.getCompactionStrategy() instanceof 
> SizeTieredCompactionStrategy)
>             {
>                 RowMutation rm = new RowMutation(cfs.table.name, new 
> Row(filter.key, returnCF.cloneMe()));
>                 try
>                 {
>
> ---> HERE               logger.debug( "Apply hoisted up row mutation" );      
>
>                     // skipping commitlog and index updates is fine since 
> we're just de-fragmenting existing data
>                     Table.open(rm.getTable()).apply(rm, false, false);
>                 }
>                 catch (IOException e)
>                 {
>                     // log and allow the result to be returned
>                     logger.error("Error re-writing read results", e);
>                 }
>             }
>
>  ... <snip> ...
>
>  Performing the steps above, I see the following traces (in the test
> environment I decreased the minimum compaction threshold to make this
> easier to reproduce). After I do a couple of update/flush, I see this in
> the log:
>
>  DEBUG [FlushWriter:7] 2012-12-14 22:54:40,106 CompactionManager.java
> (line 117) Scheduling a background task check for bob with
> SizeTieredCompactionStrategy
>
>  Then, until compaction occurs, I see (when performing a select):
>
>  DEBUG [ScheduledTasks:1] 2012-12-14 22:55:15,998 LoadBroadcaster.java
> (line 86) Disseminating load info ...
> DEBUG [Thrift:12] 2012-12-14 22:55:16,990 CassandraServer.java (line 1227)
> execute_cql_query
> DEBUG [Thrift:12] 2012-12-14 22:55:16,991 QueryProcessor.java (line 445)
> CQL statement type: SELECT
> DEBUG [Thrift:12] 2012-12-14 22:55:16,991 StorageProxy.java (line 653)
> Command/ConsistencyLevel is SliceByNamesReadCommand(table='open',
> key=804229d1933669d0a25d2a38c8b26ded10069573003e6dbb1ce21b5f402a5342,
> columnParent='QueryPath(columnFamilyName='bob', superColumnName='null',
> columnName='null')', columns=[about,changed,data,])/ONE
> DEBUG [Thrift:12] 2012-12-14 22:55:16,992 ReadCallback.java (line 79)
> Blockfor is 1; setting up requests to /10.0.4.20
> DEBUG [Thrift:12] 2012-12-14 22:55:16,992 StorageProxy.java (line 669)
> reading data locally
> DEBUG [ReadStage:61] 2012-12-14 22:55:16,992 StorageProxy.java (line 813)
> LocalReadRunnable reading SliceByNamesReadCommand(table='open',
> key=804229d1933669d0a25d2a38c8b26ded10069573003e6dbb1ce21b5f402a5342,
> columnParent='QueryPath(columnFamilyName='bob', superColumnName='null',
> columnName='null')', columns=[about,changed,data,])
> DEBUG [ReadStage:61] 2012-12-14 22:55:16,992 CollationController.java
> (line 68) In get top level columns: class
> org.apache.cassandra.db.filter.NamesQueryFilter type: Standard valid: class
> org.apache.cassandra.db.marshal.BytesType
> DEBUG [ReadStage:61] 2012-12-14 22:55:16,992 CollationController.java
> (line 84) collectTimeOrderedData
> ---> DEBUG [ReadStage:61] 2012-12-14 22:55:17,192 CollationController.java
> (line 188) tables iterated: 4 Min compact: 2
>  ----> DEBUG [ReadStage:61] 2012-12-14 22:55:17,192
> CollationController.java (line 198) Apply hoisted up row mutation
> DEBUG [ReadStage:61] 2012-12-14 22:55:17,193 Table.java (line 395)
> applying mutation of row
> 804229d1933669d0a25d2a38c8b26ded10069573003e6dbb1ce21b5f402a5342
>
>  The above traces will occur every time I repeat the above select
> statement.
>
>  Minor compaction doesn't start until a few minutes after the request was
> submitted above (note, this is an unloaded test node):
>
>  DEBUG [CompactionExecutor:11] 2012-12-14 22:57:03,278 IntervalNode.java
> (line 45) Creating IntervalNode from
> [Interval(DecoratedKey(Token(bytes[804229d1933669d0a25d2a38c8b26ded10069573003e6dbb1ce...
>
>  Once minor compaction occurs, the behavior around write count being
> incremented stops, until more than the minimum compaction threshold
> memtables are flush to disk.
>
>  So, my questions are:
>
>  1) Am I reading things correctly?
>
>  2) What is really happening here?  Essentially minor compactions can
> occur between 4 and 32 memtable flushes.  Looking through the code, this
> seems to only effect a couple types of select statements (when selecting a
> specific column on a specific key being one of them). During the time
> between these two values, every "select" statement will perform a write.
>
>  3) Is this desired behavior?  Is there something else I should be
> looking at that could be causing this behavior?
>
>  We are running Cassandra 1.1.2, with SizeTieredCompactionStrategy.
> Any help is appreciated,
> Thanks,
> -Mike
>
>
>
>
>
>
>

Reply via email to