On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser <[email protected]> wrote: > Good call. I kind of forgot about BatchScanner threads and trying to factor > those in :). I guess doing one thread in the BatchScanners would be more > accurate. > > Although, I only had one TServer, so I don't *think* there would be any > difference. I don't believe we have concurrent requests from one > BatchScanner to one TServer.
There are, if the batch scanner sees it has extra threads and there are multiple tablets on the tserver, then it will submit concurrent request to a single tserver. > > Dylan Hutchison wrote: >> >> Nice setup Josh. Thank you for putting together the tests. A few >> questions: >> >> The serial scanner implementation uses 6 threads: one for each thread in >> the thread pool. >> The batch scanner implementation uses 60 threads: 10 for each thread in >> the thread pool, since the BatchScanner was configured with 10 threads >> and there are 10 (9?) tablets. >> >> Isn't 60 threads of communication naturally inefficient? I wonder if we >> would see the same performance if we set each BatchScanner to use 1 or 2 >> threads. >> >> Maybe this would motivate a /MultiTableBatchScanner/, which maintains a >> fixed number of threads across any number of concurrent scans, possibly >> to the same table. >> >> >> On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser <[email protected] >> <mailto:[email protected]>> wrote: >> >> Sven, et al: >> >> So, it would appear that I have been able to reproduce this one >> (better late than never, I guess...). tl;dr Serially using Scanners >> to do point lookups instead of a BatchScanner is ~20x faster. This >> sounds like a pretty serious performance issue to me. >> >> Here's a general outline for what I did. >> >> * Accumulo 1.8.0 >> * Created a table with 1M rows, each row with 10 columns using YCSB >> (workloada) >> * Split the table into 9 tablets >> * Computed the set of all rows in the table >> >> For a number of iterations: >> * Shuffle this set of rows >> * Choose the first N rows >> * Construct an equivalent set of Ranges from the set of Rows, >> choosing a random column (0-9) >> * Partition the N rows into X collections >> * Submit X tasks to query one partition of the N rows (to a thread >> pool with X fixed threads) >> >> I have two implementations of these tasks. One, where all ranges in >> a partition are executed via one BatchWriter. A second where each >> range is executed in serial using a Scanner. The numbers speak for >> themselves. >> >> ** BatchScanners ** >> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled >> all rows >> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All >> ranges calculated: 3000 ranges found >> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : >> Executing 6 range partitions using a pool of 6 threads >> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries >> executed in 40178 ms >> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : >> Executing 6 range partitions using a pool of 6 threads >> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries >> executed in 42296 ms >> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : >> Executing 6 range partitions using a pool of 6 threads >> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries >> executed in 46094 ms >> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : >> Executing 6 range partitions using a pool of 6 threads >> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries >> executed in 47704 ms >> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : >> Executing 6 range partitions using a pool of 6 threads >> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries >> executed in 49221 ms >> >> ** Scanners ** >> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled >> all rows >> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All >> ranges calculated: 3000 ranges found >> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : >> Executing 6 range partitions using a pool of 6 threads >> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries >> executed in 2833 ms >> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : >> Executing 6 range partitions using a pool of 6 threads >> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries >> executed in 2536 ms >> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : >> Executing 6 range partitions using a pool of 6 threads >> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries >> executed in 2150 ms >> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : >> Executing 6 range partitions using a pool of 6 threads >> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries >> executed in 2061 ms >> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : >> Executing 6 range partitions using a pool of 6 threads >> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries >> executed in 2140 ms >> >> Query code is available >> https://github.com/joshelser/accumulo-range-binning >> <https://github.com/joshelser/accumulo-range-binning> >> >> >> Sven Hodapp wrote: >> >> Hi Keith, >> >> I've tried it with 1, 2 or 10 threads. Unfortunately there where >> no amazing differences. >> Maybe it's a problem with the table structure? For example it >> may happen that one row id (e.g. a sentence) has several >> thousand column families. Can this affect the seek performance? >> >> So for my initial example it has about 3000 row ids to seek, >> which will return about 500k entries. If I filter for specific >> column families (e.g. a document without annotations) it will >> return about 5k entries, but the seek time will only be halved. >> Are there to much column families to seek it fast? >> >> Thanks! >> >> Regards, >> Sven >> >> >
