On Wed, Mar 7, 2018 at 7:13 AM, Carlos Rolo <r...@pythian.com> wrote:
> Hi Jeff,
> Could you expand: "Tables without clustering keys are often deceptively
> expensive to compact, as a lot of work (relative to the other cell
> boundaries) happens on partition boundaries." This is something I didn't
> know and highly interesting to know more about!
We do a lot "by partition". We build column indexes by partition. We update
the partition index on each partition. We invalidate key cache by
partition. They're not super expensive, but they take time, and tables with
tiny partitions can actually be slower to compact.
There's no magic cutoff where it does/doesn't make sense, my comment is
mostly a warning that the edges of the "normal" use cases tend to be less
optimized than the common case. Having a table with a hundred billion
records, where the key is numeric and the value is a single byte (let's say
you're keeping track of whether or not a specific sensor has ever detected
some magic event, and you have 100B sensors, that table will be close to
the worst-case example of this behavior).