[Neil Schemenauer <nas-pyt...@arctrix.com>] > ... > BTW, the current radix tree doesn't even require that pools are > aligned to POOL_SIZE. We probably want to keep pools aligned > because other parts of obmalloc rely on that.
obmalloc relies on it heavily. Another radix tree could map block addresses to all the necessary info in a current pool header, but that could become gigantic. For example, even a current "teensy" 256 KiB arena can be carved into the order of 16K 16-byte blocks (the smallest size class). And finding a pool header now is unbeatably cheap: just clear the last 12 address bits, and you're done. > Here is the matchup of the radix tree vs the current > address_in_range() approach. > > - nearly the same in terms of performance. It might depend on OS > and workload but based on my testing on Linux, they are very > close. Would be good to do more testing but I think the radix > tree is not going to be faster, only slower. People should understand that the only point to these things is to determine whether a pointer passed to a free() or realloc() spelling was obtained from an obmalloc pool or from the system malloc family. So they're invoked once near the very starts of those two functions, and that's all. Both ways are much more expensive than finding a pool header (which is just clearing some trailing address bits). The current way reads an "arena index" out of the pool header, uses that to index the file static `arenas` vector to get a pointer to an arena descriptor, then reads the arena base address out of the descriptor. That;s used to determine whether the original address is contained in the arena. The primary change my PR makes is to read the arena index from the start of the _page_ the address belongs instead (the pool header address is irrelevant to this, apart from that a pool header is aligned to the first page in a pool). The radix tree picks bits out of the address three times to index into a 3-level (but potentially broad) tree, ending with a node containing info about the only two possible arenas the original address may belong to. Then that info is used to check. The number of loads is essentially the same, but the multiple levels of indexing in the tree is a teensy bit more expensive because it requires more bit-fiddling. I spent hours, in all, dreaming up a way to make the _seemingly_ more complex final "so is the address in one of those two arenas or not?" check about as cheap as the current way. But Neil didn't see any significant timing difference after making that change, which was mildly disappointing but not really surprising: arithmetic is just plain cheap compared to reading up memory. > - radix tree uses a bit more memory overhead. Maybe 1 or 2 MiB on a > 64-bit OS. The radix tree uses more as memory use goes up but it > is a small fraction of total used memory. The extra memory use is > the main downside at this point, I think. I'd call it the only downside. But nobody yet has quantified how bad it can get. > - the radix tree doesn't read uninitialized memory. The current > address_in_range() approach has worked very well but is relying on > some assumptions about the OS (how it maps pages into the program > address space). This is the only aspect where the radix tree is > clearly better. I'm not sure this matters enough to offset the > extra memory use. I'm not worried about that. The only real assumption here is that if an OS supports some notion of "pages" at all, then for any address for which the program has read/write access (which are the only kinds of addresses that can be sanely passed to free/realloc), the OS allows the same access to the entire page containing that address. In two decades we haven't seen an exception to that yet, right? It's hard to imagine a HW designer thinking "I know! Let's piss away more transistors on finer-grained control nobody has asked for, and slow down memory operations even more checking for that." ;-) > - IMHO, the radix tree code is a bit simpler than Tim's > obmalloc-big-pool code. Absolutely so. There's another way to look at this: if Vladimir Marangozov (obmalloc's original author) had used an arena radix tree from the start, would someone now get anywhere proposing a patch to change it to the current scheme? I'd expect a blizzard of -1 votes, starting with mine ;-) > ... > My feeling right now is that Tim's obmalloc-big-pool is the best > design at this point. Using 8 KB or 16 KB pools seems to be better > than 4 KB. The extra complexity added by Tim's change is not so > nice. obmalloc is already extremely subtle and obmalloc-big-pool > makes it more so. Moving to bigger pools and bigger arenas are pretty much no-brainers for us, but unless pool size is increased there's no particular reason to pursue either approach - "ain't broke, don't fix". Larry Hastings started a "The untuned tunable parameter ARENA_SIZE" thread here about two years ago, where he got a blizzard of flak for merely suggesting that perhaps, after 20 years, it may be time to increase the 256 KiB arena size ;-) Ah, BTW, there's another real advantage to the radix tree, although it's minor: less "quantization" loss in bigger pools. For example, we can only get 7 objects of the largest (512 bytes) size class now. Move my PR to 16KiB pools and the pool gets 4 times bigger, but we can still only get 7*4 = 28 objects out of that pool. It's still "stealing" bytes out of every page. But the radix tree approach doesn't: only the pool header takes away from block space. So move yours to 16 KiB pools, and we could get 31 512-byte blocks out of the pool (3 more). _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LQK4TMRZW4SOFJAPUXIB2AIQSOV5ZKYC/