[Tim]
>> Something I haven't seen mentioned here, but I may have missed it:
>> when timing algorithms with sets and dicts, people often end up merely
>> measuring the bad effects of hardware data cache misses when the
>> containers get large, and especially in contrived benchmarks.
>>
>> In those the base data tends to get built up in a contiguous range of
>> memory, and storing the objects in a list leaves traversing the list
>> fetching the objects' data in the same contiguous memory order.  It's
>> as cache-friendly as can be.

[Steven D'Aprano <st...@pearwood.info>]
> I'm not sure if this is something specific to timing tests, or if it
> will hold for real-world data in non-contrived code too.

Depends on the data access patterns of the app.  There's really no
simple way to know.

One possible clue:  in recent CPythons,

   sys._debugmallocstats()

can be called to get info about the state of Python's small object allocator.

Compare the "# bytes in allocated blocks" output line to the total
number of bytes across all arenas.  If that ratio is close to 1.0 then
at least we're _using_ most of arena memory.  The closer to 0 it gets,
the more data is splattered around, with large gaps between used
bytes.

Since CPython _never_ moves an object in memory, fragmentation can
become nearly arbitrarily bad.  As bad as using only 8 bytes out of
each 256 KiB allocated (although I expect it would take effort to
contrive a program in the same universe as being that bad).


> If it holds for real data, then presumably it counts as as advantage of
> lists over sets. But if it is an artifact of naive/contrived timing
> tests, is there something we can do to make the tests less naive?

What's the first rule of benchmarking?  RUN ON A MACHINE AS QUIET AS POSSIBLE

So it's deliberately and spectacularly naive.  Benchmark results
should really be viewed as lower bounds:  guarantees that no related
real-life code will ever run faster, and almost always slower ;-)

They remain the easiest way to guess whether one approach "is likely
to" run faster than another, though.

An easy example of major cache effects:

    xs = list(range(1000000))
    sum(xs)

Then do the same, but use random.shuffle(xs) before summing.  Just
timing the `sum(xs)` part, a quick timeit run on my non-quiet box just
now showed the latter taking well over 4x longer.  The original
spelling tends to access RAM (for the guts of the int objects) in
sequential order, but shuffle first and sum() leaps all over RAM to
get from one int to the next.

The memory layout is the same either way, so staring at obmalloc stats
won't yield any clue in that case.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CSDW4LSTUG4NQBGFSJDR7DDSSMEAX5YC/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to