Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

2019-01-31 Thread Jonathan Haddad
In addition to what Jeff mentioned, there was an optimization in 3.4 that
can significantly reduce the number of sstables accessed when a LIMIT
clause was used.  This can be a pretty big win with TWCS.

http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html

On Thu, Jan 31, 2019 at 5:50 PM Jeff Jirsa  wrote:

> In my original TWCS talk a few years back, I suggested that people make
> the partitions match the time window to avoid exactly what you’re
> describing. I added that to the talk because my first team that used TWCS
> (the team for which I built TWCS) had a data model not unlike yours, and
> the read-every-sstable thing turns out not to work that well if you have
> lots of windows (or very large partitions). If you do this, you can fan out
> a bunch of async reads for the first few days and ask for more as you need
> to fill the page - this means the reads are more distributed, too, which is
> an extra bonus when you have noisy partitions.
>
> In 3.0 and newer (I think, don’t quote me in the specific version), the
> sstable metadata has the min and max clustering which helps exclude
> sstables from the read path quite well if everything in the table is using
> timestamp clustering columns. I know there was some issue with this and RTs
> recently, so I’m not sure if it’s current state, but worth considering that
> this may be much better on 3.0+
>
>
>
> --
> Jeff Jirsa
>
>
> > On Jan 31, 2019, at 1:56 PM, Carl Mueller 
> > 
> wrote:
> >
> > Situation:
> >
> > We use TWCS for a task history table (partition is user, column key is
> > timeuuid of task, TWCS is used due to tombstone TTLs that rotate out the
> > tasks every say month. )
> >
> > However, if we want to get a "slice" of tasks (say, tasks in the last two
> > days and we are using TWCS sstable blocks of 12 hours).
> >
> > The problem is, this is a frequent user and they have tasks in ALL the
> > sstables that are organized by the TWCS into time-bucketed sstables.
> >
> > So Cassandra has to first read in, say 80 sstables to reconstruct the
> row,
> > THEN it can exclude/slice on the column key.
> >
> > Question:
> >
> > Or am I wrong that the read path needs to grab all relevant sstables
> before
> > applying column key slicing and this is possible? Admittedly we are in
> 2.1
> > for this table (we in the process of upgrading now that we have an
> > automated upgrading program that seems to work pretty well)
> >
> > If my assumption is correct, then the compaction strategy knows as it
> > writes the sstables what it is bucketing them as (and could encode in
> > sstable metadata?). If my assumption about slicing is that the whole row
> > needs reconstruction, if we had a perfect infinite monkey coding team
> that
> > could generate whatever we wanted within some feasibility, could we
> provide
> > special hooks to do sstable exclusion based on metadata if we know that
> > that the metadata will indicate exclusion/inclusion of columns based on
> > metadata?
> >
> > Goal:
> >
> > The overall goal would be to support exclusion of sstables from a read
> > path, in case we had compaction strategies hand-tailored for other
> queries.
> > Essentially we would be doing a first-pass bucketsort exclusion with the
> > sstable metadata marking the buckets. This might aid support of superwide
> > rows and paging through column keys if we allowed the table creator to
> > specify bucketing as flushing occurs. In general it appears query
> > performance quickly degrades based on # sstables required for a lookup.
> >
> > I still don't know the code nearly well enough to do patches, it would
> seem
> > based on my looking at custom compaction strategies and the basic read
> path
> > that this would be a useful extension for advanced users.
> >
> > The fallback would be a set of tables to serve as buckets and we span the
> > buckets with queries when one bucket runs out. The tables rotate.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

2019-01-31 Thread Jeff Jirsa
In my original TWCS talk a few years back, I suggested that people make the 
partitions match the time window to avoid exactly what you’re describing. I 
added that to the talk because my first team that used TWCS (the team for which 
I built TWCS) had a data model not unlike yours, and the read-every-sstable 
thing turns out not to work that well if you have lots of windows (or very 
large partitions). If you do this, you can fan out a bunch of async reads for 
the first few days and ask for more as you need to fill the page - this means 
the reads are more distributed, too, which is an extra bonus when you have 
noisy partitions.

In 3.0 and newer (I think, don’t quote me in the specific version), the sstable 
metadata has the min and max clustering which helps exclude sstables from the 
read path quite well if everything in the table is using timestamp clustering 
columns. I know there was some issue with this and RTs recently, so I’m not 
sure if it’s current state, but worth considering that this may be much better 
on 3.0+



-- 
Jeff Jirsa


> On Jan 31, 2019, at 1:56 PM, Carl Mueller 
>  wrote:
> 
> Situation:
> 
> We use TWCS for a task history table (partition is user, column key is
> timeuuid of task, TWCS is used due to tombstone TTLs that rotate out the
> tasks every say month. )
> 
> However, if we want to get a "slice" of tasks (say, tasks in the last two
> days and we are using TWCS sstable blocks of 12 hours).
> 
> The problem is, this is a frequent user and they have tasks in ALL the
> sstables that are organized by the TWCS into time-bucketed sstables.
> 
> So Cassandra has to first read in, say 80 sstables to reconstruct the row,
> THEN it can exclude/slice on the column key.
> 
> Question:
> 
> Or am I wrong that the read path needs to grab all relevant sstables before
> applying column key slicing and this is possible? Admittedly we are in 2.1
> for this table (we in the process of upgrading now that we have an
> automated upgrading program that seems to work pretty well)
> 
> If my assumption is correct, then the compaction strategy knows as it
> writes the sstables what it is bucketing them as (and could encode in
> sstable metadata?). If my assumption about slicing is that the whole row
> needs reconstruction, if we had a perfect infinite monkey coding team that
> could generate whatever we wanted within some feasibility, could we provide
> special hooks to do sstable exclusion based on metadata if we know that
> that the metadata will indicate exclusion/inclusion of columns based on
> metadata?
> 
> Goal:
> 
> The overall goal would be to support exclusion of sstables from a read
> path, in case we had compaction strategies hand-tailored for other queries.
> Essentially we would be doing a first-pass bucketsort exclusion with the
> sstable metadata marking the buckets. This might aid support of superwide
> rows and paging through column keys if we allowed the table creator to
> specify bucketing as flushing occurs. In general it appears query
> performance quickly degrades based on # sstables required for a lookup.
> 
> I still don't know the code nearly well enough to do patches, it would seem
> based on my looking at custom compaction strategies and the basic read path
> that this would be a useful extension for advanced users.
> 
> The fallback would be a set of tables to serve as buckets and we span the
> buckets with queries when one bucket runs out. The tables rotate.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

2019-01-31 Thread Carl Mueller
Situation:

We use TWCS for a task history table (partition is user, column key is
timeuuid of task, TWCS is used due to tombstone TTLs that rotate out the
tasks every say month. )

However, if we want to get a "slice" of tasks (say, tasks in the last two
days and we are using TWCS sstable blocks of 12 hours).

The problem is, this is a frequent user and they have tasks in ALL the
sstables that are organized by the TWCS into time-bucketed sstables.

So Cassandra has to first read in, say 80 sstables to reconstruct the row,
THEN it can exclude/slice on the column key.

Question:

Or am I wrong that the read path needs to grab all relevant sstables before
applying column key slicing and this is possible? Admittedly we are in 2.1
for this table (we in the process of upgrading now that we have an
automated upgrading program that seems to work pretty well)

If my assumption is correct, then the compaction strategy knows as it
writes the sstables what it is bucketing them as (and could encode in
sstable metadata?). If my assumption about slicing is that the whole row
needs reconstruction, if we had a perfect infinite monkey coding team that
could generate whatever we wanted within some feasibility, could we provide
special hooks to do sstable exclusion based on metadata if we know that
that the metadata will indicate exclusion/inclusion of columns based on
metadata?

Goal:

The overall goal would be to support exclusion of sstables from a read
path, in case we had compaction strategies hand-tailored for other queries.
Essentially we would be doing a first-pass bucketsort exclusion with the
sstable metadata marking the buckets. This might aid support of superwide
rows and paging through column keys if we allowed the table creator to
specify bucketing as flushing occurs. In general it appears query
performance quickly degrades based on # sstables required for a lookup.

I still don't know the code nearly well enough to do patches, it would seem
based on my looking at custom compaction strategies and the basic read path
that this would be a useful extension for advanced users.

The fallback would be a set of tables to serve as buckets and we span the
buckets with queries when one bucket runs out. The tables rotate.