On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <[email protected]> wrote: > Thanks William. > > The issue here is that without knowing how the numQueryThreads translates to > the number of concurrent scans, I cannot effectively tune that parameter to > maximize resource usage on the tablet server. What I'm seeing is that even > though there are four tablets on the tablet server, my number of concurrent > scans never exceeds 3. This is despite setting numQueryThreads to a very > high number and having 8 cores on the tablet server. I suspect with 3 > concurrent scans and no garbage collection happening at that moment, most of > the cores are sitting idle. > > Ameet
The amount if parallelism is determined by how your ranges map to tablets. Below are some examples. * For one range that maps to 10 tablets on 10 tablets severs, it will execute 10 concurrent scans if numQueryThreads is >= 10. * For 1000 ranges that map to 10 tablets on 10 tablet servers, it will execute 10 concurrent scans if numQueryThreads is >= 10. * For 1000 ranges that map to 10 tablets on 10 tablet servers, it will execute 5 concurrent scans if numQueryThreads is 5. * For 1000 ranges that map to 1 tablet, it will execute 1 concurrent scan. If you have more query threads than tablet server, the client code will try to execute concurrent scans on a single tablet server. You can look at TabletServerBatchReaderIterator.doLookups() for the details. In this method it creates QueryTask objects and places them on a thread pool. The size of the thread pool is the user specified numQueryThreads. > > On Tue, Sep 25, 2012 at 3:08 PM, William Slacum > <[email protected]> wrote: >> >> It should really be dependent upon the resources available to the client. >> You can set an arbitrarily high number of threads, but you're still bound by >> the number of parallel operations the CPU can make. I would assume the sweet >> spot is somewhere around that number-- try doing a small bench mark with 2, >> 4, 8, 16, etc threads and see where your performance starts to level off. >> >> >> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <[email protected]> wrote: >>> >>> Probably worth adding that the table mentioned below has a bunch of >>> tablets on other tablet servers as well, which is why I'm using >>> BatchScanner. I'm just not sure how the numQueryThreads relates to the >>> number of a concurrent scans on a given tablet server. >>> >>> Thanks >>> >>> >>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <[email protected]> wrote: >>>> >>>> >>>> I have a table with 4 tablets on a given tablet server. Depending on the >>>> numQueryThreads parameter below, I see a varying number of maximum >>>> concurrent scans on that table. This maximum number varies from 1 to 3 >>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of >>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light >>>> on what is the relationship between numQueryThreads and number of >>>> concurrent >>>> scans? >>>> >>>> public BatchScanner createBatchScanner(String tableName, >>>> Authorizations authorizations, >>>> int numQueryThreads) >>>> >>>> A follow-on question would be what is general rule of thumb for setting >>>> numQueryThreads? Should it be set to the # of hosted tablets expected to >>>> be >>>> consumed by that BatchScanner? Should it be the # of tablet servers >>>> expected >>>> to be hit by that BatchScanner? Something else? >>>> >>>> Thanks, >>>> Ameet >>>> >>>> >>> >> >
