So here's a simple attempt at taking lots of measurements just using time.time() with lists of ints. The results are great, if they are valid (which I leave to you to judge); even for lists with just one element, it's 16% faster! The columns are length, number of trials, and percent improvement:
Integer benchmarks 1 10000000 1-fastsort()/sort(): 0.14896401672523163 5 2000000 1-fastsort()/sort(): 0.2915859704616376 10 1000000 1-fastsort()/sort(): 0.3576859149132914 1000 10000 1-fastsort()/sort(): 0.3230363920681264 1000000 10 1-fastsort()/sort(): 0.292595011903772 And here's the benchmark script: from fastlist import FastList import random; random.seed(42) from time import time def bench_list(size, trials): L = [random.randrange(size) for _ in range(size)] trials = int(trials); size = int(size) fastlist_time = 0 for _ in range(trials): F = FastList(L) start = time() F.fastsort() fastlist_time += time() - start defaultlist_time = 0 for _ in range(trials): F = FastList(L) start = time() F.sort() defaultlist_time += time() - start print(size, trials, "1-fastsort()/sort():", 1-fastlist_time/defaultlist_time) print("Integer benchmarks") bench_list(1, 1e7) bench_list(5, 1e7/5) bench_list(10, 1e7/10) bench_list(1000, 1e7/1000) bench_list(1000000, 1e7/1e6) Is that a valid way to benchmark the implementation? On Mon, Oct 10, 2016 at 8:15 PM Elliot Gorokhovsky < elliot.gorokhov...@gmail.com> wrote: > Right - sorry about that last bit! > > I couldn't figure out how to use timeit/perf for this because the list has > to be reinitialized each time it gets sorted. So with time.time() I can > just wrap the sorting and cut the initialization out (and I could always > throw it in a for loop to do it as many times as necessary). But with > timeit/perf, as I understand, you just give it a number of iterations and a > code snippet and it loops for you. So I don't see how I could get it to cut > out the initialization. There's an option to provide setup code, of course, > but I need to set up before each trial, not just before the loop. > > Regardless, one could always wrap the whole contribution with a length > test and disable for lists with less than, say, 1000 elements. And though > for the above reason I couldn't figure out how to benchmark properly for > small lists, it's clear that for large lists this gives a real improvement > with very little change in the code: the fact is that it really doesn't > make sense to do all this typechecking nlogn times when we can just get it > over with once at the beginning. The cost is very, very small, as > demonstrated by the last benchmark (which is for a 1e7 list, so it is > probably more valid with my crude technique), so heterogeneous lists don't > get slowed down perceptibly. > > On Mon, Oct 10, 2016 at 8:06 PM Guido van Rossum <gu...@python.org> wrote: > > So maybe someone should explain to Elliott *why* his own benchmarks > > are not trustworthy, rather than just repeat "use perf or timeit". > > Actually, there are two things: (a) when something new comes along it > > *always* needs to prove beyond a shadow of a doubt that it is actually > > an improvement and not a timing artifact or a trick; (b) you can't > > time sorting 10 values *once* and get a useful result. You have to do > > it many times. And you have to make sure that creating a list of 10 > > random values isn't taken as part of your test -- that's tricky since > > random() isn't all that fast; but it has to be done. > > > > Although Elliott had it coming when he used needlessly offensive > > language in his first post. > > > > On Mon, Oct 10, 2016 at 5:09 PM, Steven D'Aprano <st...@pearwood.info> > wrote: > > > On Mon, Oct 10, 2016 at 09:16:32PM +0000, Elliot Gorokhovsky wrote: > > > > > >> Anyway, benchmarking technique aside, the point is that it it works well > > >> for small lists (i.e. doesn't affect performance). > > > > > > You've been shown that there is something suspicious about your > > > benchmarking technique, something that suggests that the timing results > > > aren't trustworthy. Until you convince us that your timing results are > > > reliable and trustworthy, you shouldn't be drawing *any* conclusions > > > about your fastsort versus the standard sort. > > > > > > > > > -- > > > Steve > > > _______________________________________________ > > > Python-Dev mailing list > > > Python-Dev@python.org > > > https://mail.python.org/mailman/listinfo/python-dev > > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > > > > -- > > --Guido van Rossum (python.org/~guido) > >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com