On Wed, Mar 12, 2014 at 02:24:54PM +0100, Gerd Hoffmann wrote: > On Mi, 2014-03-12 at 09:05 -0400, Gabriel L. Somlo wrote: > > On Wed, Mar 12, 2014 at 09:27:18AM +0100, Gerd Hoffmann wrote: > > > I think we should just use e820_table (see pc.c) here. Loop over it and > > > add a type 19 table for each ram region in there. > > > > I'm assuming this should be another post-Seabios-compatibility patch, > > at the end of the series, and I should still do the (start,size) > > arithmetic cut'n'pasted from SeaBIOS first, right ? > > You should get identical results with both methods. It's just that the > e820 method is more future proof, i.e. if the numa people add support > for non-contignous memory some day we don't have to adapt the smbios > code to handle it.
So I spent some time reverse-engineering the way Type 16..20 (memory) smbios tables are built in SeaBIOS, and therefore in the QEMU smbios patch set currently under revision... And I came up with the following picture (caution: ascii art, fixed-width font strongly recommended): ---------------------------------------------------------------------------- | Type16 0x1000 | ---------------------------------------------------------------------------- ^ ^ ^ ^ ^ ^ | | | | | | | ----+--- ----+---- ----+---- ---------+-------- | | | Type17 | | Type17 | | Type17 | | Type17 | | | | 0..16G | | 16..32G | | 32..48G | ... | N*16G..(N+1)*16G | | | | 0x1100 | | 0x1101 | | 0x1102 | | 0x110<N> | | | -------- --------- --------- ------------------ | | ^ ^ ^ ^ ^ | | | | | | | | | +--+ +--+ | | | | | | | | | | | | ----+--- ---+---- ----+---- ----+---- ---------+-------- | | | Type20 | | Type20 | | Type20 | | Type20 | | Type20 | | | | 0..4G | | 4..16G | | 16..32G | | 32..48G | ... | N*16G..(N+1)*16G | | | | 0x1400 | | 0x1401 | | 0x1402 | | 0x1403 | | 0x140<N+1> | | | ----+--- ---+---- ----+---- ----+---- ---------+-------- | | | | | | | | | | | +-------+ | +----------------+ | | | +----------------+ | | | | | | | | | | | | v v v v v | | -------- -------------- | | | Type19 | | Type19 | | | | 0..4G | | 4G..ram_size | | | | 0x1300 | | 0x1301 | | | ----+--- ------+------- | | | | | +-------+ +----------------------------------+ Here are some of the limit values, and some questions and thoughts: - Type16 max == 2T - 1K; Should we just assert((ram_size >> 10) < 0x80000000), and officially limit guests to < 2T ? - Type17 max == 32G - 1M; This explains why we create Type17 device tables in increments of 16G, since that's the largest possible value that's a nice, round power of two :) - Type19 & Type20 max == 4T - 1K; If we limit ourselves to what Type16 can currently represent (2T), this should be plenty enough to work with... So, currently, we split available ram into blobs of up to 16G each, and assign each blob a Type17 node. We then split available ram into <4G and 4G+, and create up to two Type19 nodes for these two areas. Now, re. e820: currently, the expectation is that the (up to) two Type19 nodes in the above figure correspond to (up to) two entries of type E820_RAM in the e820 table. Then, a type20 node is assigned to the sub-4G portion of the first Type17 "device", and another type20 node is assigned to the over-4G portion of the same. >From then on, type20 nodes correspond to the rest of the 16G-or-less type17 devices pretty much on a 1:1 basis. If the e820 table will contain more than just two E820_RAM entries, and therefore we'll have more than the two Type19 nodes on the bottom row, what are the rules for extending the rest of the figure accordingly (i.e. how do we hook together more Type17 and Type20 nodes to go along with the extra Type19 nodes) ? Thanks much, --Gabriel