The upper limit of 4 TB is for data on-disk (post-encoding, post-compression, and post-replication); it does not include in-memory data from memrowsets or deltamemstores.
The value of the limit is based on the kinds of workloads tested by the Kudu development community. As a group we feel comfortable supporting users up to 4 TB because we've run such workloads ourselves. Beyond 4 TB, however, we're not exactly sure what becomes slow, what breaks, etc. Speaking from experience, as the amount of on-disk data grows, tservers will take longer to start-up. You might become vulnerable to KUDU-2050; we're not sure. In order to reach that amount of data you'll probably also raise the number of tablets hosted by the tserver. This can increase the tserver's thread count, file descriptor count, and may cause slowdowns in other areas. In short, nothing will "happen" the moment you cross 4 TB, it's just that you'll be entering relatively uncharted waters and might encounter unusual or unexpected behavior. If that doesn't deter you, by all means give it a shot (and report back with your findings)! On Wed, Aug 30, 2017 at 5:53 PM, 李津 <[email protected]> wrote: > why per tserver have the upper limit of 4T and it include the memrowset > data? we also not testing more than 4T. what will happen if reach the upper > limit?
