It's very dependent on the requirements of your application and the amount of data your application is serving. A general recommendation which should be universal is try to limit each server to hundreds of tablets. This, like everything else, is also a loose recommendation.
Likely, this will require experimentation on your end. If you can share more details about the specifics of your data set and requirements, we might be able to give you some more direction. On Tue, Jul 19, 2016 at 12:35 PM, Jamie Johnson <[email protected]> wrote: > Thank you, this was helpful. What about the number of splits for a table. > Is there a general rule of thumb for how many splits and what size they > should be when trying to balance ingest/query performance? > > On Fri, Jul 15, 2016 at 2:38 PM, Emilio Lahr-Vivaz <[email protected]> > wrote: >> >> Another thing to consider is how many tablet servers the mutations are >> being sent to - if they're all going to a single split, that's going to >> reduce your throughput a lot. >> >> >> On 07/15/2016 02:33 PM, [email protected] wrote: >> >> The batch writer has several knobs (latency time, memory buffer, etc) that >> you can tune to meet your requirements. The values for those settings will >> depend on a lot of variables, to include: >> >> - number of tablet servers >> - size of mutations >> - desired latency >> - memory buffer >> - configuration settings on the table(s) and tablet servers. >> >> Suggest picking a starting point and see how it works for you, such as >> >> threads - equal to the number of tablet servers (unless you have a >> really large number of tablet servers) >> buffer - 100MB >> latency - 10 seconds >> >> If you are hitting a wall with those settings, you could increase the >> buffer and latency and/or change some settings on the server side that have >> to do with the write ahead logs. >> >> ________________________________ >> From: "Jamie Johnson" <[email protected]> >> To: [email protected] >> Sent: Friday, July 15, 2016 2:16:40 PM >> Subject: Configuring batch writers >> >> Is there any documentation that outlines reasonable settings for batch >> writers given a known ingest rate? For instance if I have a source that is >> producing in the neighborhood of 15MB of mutations per second, what would a >> reasonable configuration for the batch writer be to handle an ingest at this >> rate? What are reasonable rules of thumb to follow to ensure that the >> writers don't block, etc? >> >> >
