Josh, Thanks for your response.
My iterators will do the same number of seeks, they're only different in the implementation of the functions used to perform filtering, so I think I'll get a reasonable comparison but I won't read too much into the results. On 13 May 2015 at 21:19, Josh Elser <[email protected]> wrote: > As long as you're managing your expectations (which I sounds like you've > considered well), there could be some worth. > > A concern would be how using a different filesystem implementation > actually impacts the validity of your benchmark though. > > e.g. w/ a local FS (which is by default what MAC does), a disk seek costs > 10ms, but using your real HDFS cluster, it's 200ms. IteratorA does more > seeks but is less efficient on the retrieved data while IteratorB does > fewer seeks but is more efficient on the retrieved data would lead to > inaccurate benchmarks on a production system. > > I guess another way to put it is that total wall time for a query might be > deceiving in a test environment. > > > Dave Hardcastle wrote: > >> Hi, >> >> Is it crazy to use a MiniAccumuloCluster to measure the *relative* >> performance of two different implementations of iterators? >> >> Obviously it would be better to do it on a real Accumulo cluster, but >> that's not possible for several reasons. >> >> The approach would be something like: >> - Fire up a Mini cluster >> - Bulk import a file >> - Start timer >> - Set up a BatchScanner with one of the iterator stacks and use it to >> query for lots of different ranges >> - Iterate through the results of this >> - Stop timer >> >> Repeat with the other implementation of the iterators. >> >> Of course, the difference in performance may not be measurable, if the >> time is dominated by the disk-seek time, but that would still be useful >> information. And the absolute performance wouldn't be representative of >> what you'd get on a real cluster as there's no network latency in these >> trials, but that's fine as I'm mainly interested in which of the two >> implementations of the iterators is most performant. >> >> Similarly, could the same approach be used to compare the performance on >> SSD vs hard disk? >> >> Thanks, >> >> Dave. >> >>
