On 14.05.19 10:56, Christian Borntraeger wrote: > > > On 14.05.19 10:50, David Hildenbrand wrote: >> On 14.05.19 10:37, Christian Borntraeger wrote: >>> >>> >>> On 14.05.19 09:28, David Hildenbrand wrote: >>>>>>> But that can be tested using the runability information if I am not >>>>>>> wrong. >>>>>> >>>>>> You mean the cpu level information, right? >>>> >>>> Yes, query-cpu-definition includes for each model runability information >>>> via "unavailable-features" (valid under the started QEMU machine). >>>> >>>>>> >>>>>>> >>>>>>>> and others that we have today. >>>>>>>> >>>>>>>> So yes, I think this would be acceptable. >>>>>>> >>>>>>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in >>>>>>> production either way. But you never know. >>>>>> >>>>>> I think that using that many cpus is a more uncommon setup, but I still >>>>>> think that having to wait for actual failure >>>>> >>>>> That can happen all the time today. You can easily say z14 in the xml >>>>> when >>>>> on a zEC12. Only at startup you get the error. The question is really: >>>> >>>> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12" >>>> will work. Actually, even "-smp 248" will no longer work on affected >>>> machines. >>>> >>>> That is why wonder if it is better to disable the feature and print a >>>> warning. Similar to CMMA, where want want to tolerate when CMMA is not >>>> possible in the current environment (huge pages). >>>> >>>> "Diag318 will not be enabled because it is not compatible with more than >>>> 240 CPUs". >>>> >>>> However, I still think that implementing support for more than one SCLP >>>> response page is the best solution. Guests will need adaptions for > 240 >>>> CPUs with Diag318, but who cares? Existing setups will continue to work. >>>> >>>> Implementing that SCLP thingy will avoid any warnings and any errors. It >>>> just works from the QEMU perspective. >>>> >>>> Is implementing this realistic? >>> >>> Yes it is but it will take time. I will try to get this rolling. To make >>> progress on the diag318 thing, can we error on startup now and simply >>> remove that check when when have implemented a larger sccb? If we would >>> now do all kinds of "change the max number games" would be harder to "fix". >> >> >> Another idea for temporary handling: Simply only indicate 240 CPUs to >> the guest if the response does not fit into a page. Once we have that >> SCLP thingy, this will be fixed. Guest migration back and forth should >> work, as the VCPUs are fully functional (and initially always stopped), >> the guest will simply not be able to detect them via SCLP when booting >> up, and therefore not use them. > > Yes, that looks like a good temporary solution. In fact if the guest relies > on simply probing it could even make use of the additional CPUs. Its just > the sclp response that is limited to 240 (or make it 247?)
I think the limiting factor was more than a single CPU, but I don't recall. We can do the math again and come up with the right number. -- Thanks, David / dhildenb