Re: [Freedos-devel] Slowdown-Units ratings and a CPU-bound depacker be nchmark
> However, the topic of port I/O and MMIO reminds me of some activity > by RayeR and others regarding low-level configuration to speed up > access to PCI / PCIe graphics in DOS. I think the issue was that no > fast defaults were applied by the BIOS, slowing down VESA LFB a lot. Video is kind of a different animal than most other things. There is I/O that controls the hardware things like video modes, display resolutions, etc.), and that can be either MMIO or PMIO. But the memory-mapping of what actually gets displayed on screen (like segment B800h for color text modes) really isn't MMIO, and least not in the traditional sense. It is actually called dual-ported RAM or Video RAM, and allows both reads and writes from the CPU-side while also allowing the graphics hardware to read the same RAM at the same time so the graphics hardware knows what to display on the screen. Regular RAM can't do that. In addition, like MMIO, Video RAM can't be cached since that would effectively delay what the graphics hardware needs to see to know what to display on the screen. I do remember seeing some stuff from RayeR that I think was related to video and modifying something in the Model Specific Registers (MSRs) of the CPUs, but don't remember all the details. ___ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel
Re: [Freedos-devel] Slowdown-Units ratings and a CPU-bound depacker be nchmark
>> I tried posting a much longer response to this, but it was >> apparently rejected by the moderators. Here's a shorter one. > As I already mentioned in the other reply to this thread, feel free > to send me more specific replies to that article. I'm not sure why it didn't get posted -- it got "lost in the Ether" somehow and that maybe had nothing to do with the moderation. It was in part a response to your specific article, but also a much longer response to the overall thread which has taken a lot of circuitous tangents. I know it will piss off some people and will spur an even longer thread, so am unsure that I should try to post it again. But I also think it says some things that need to be said. If you're willing, I'll send it to you privately and see if you think I should try posting it again. > I/O has also vastly speedup (we have SSD speeds of up to 6 > GB/sec). Just not by doing IN/OUT, but by using memory mapped > PCI devices. I think you're confusing two different things -- MMIO and DMA/Bus-Mastering. >>> >>> He is NOT. >> >> I wasn't talking about ecm being confused, I was talking about you. >> AFAIK, ecm never tested either MMIO or bus-mastering so never said >> anything about them. > > Yes, the only tests I did involved running Slowdown with and without > the one port I/O instruction patched out in the waste loop. However, > Tom is correct that this specific *port* I/O access to that > particular port is not representative of all possible ways of doing > I/O. That is correct, but when I pointed out there really is not any difference in speed between MMIO and PMIO (the two general categories of doing I/O), Tom accused me of spewing BS. I gave him the opportunity to defend himself with some data, and he so far hasn't done that. Let me explain why I said what I did. In the Intel architecture, there is only one address bus and one data bus (in some of the older CPUs the address and data bus were the same physical pins on the CPU, but that is a peripheral discussion). The exact same address and data bus(ses) are used for access to both ports (PMIO) and memory (RAM or MMIO). There is a pin (I've seen it referred to both as IO/M and M/IO) on the CPU that is also part of the bus that tells the external devices whether the address on the bus is a port address or a memory address. The device (or RAM) with both the correct address _and_ address type is the only one allowed to respond. For PMIO, of course, only the first 16 address lines are valid and the rest are ignored (there are only 64k possible PMIO addresses). Also, instead of single pin on the CPU, some of them have a set of pins where some combination of highs and lows on the pins designate ports vs. memory addresses, but the concept is the same. The point is that there is only one bus. When the CPU wants to send or receive data from a device (whether it is RAM or I/O) it sets the pins on the address bus appropriately (the address, address type, direction of transfer, and some others as well) and waits for the device to respond. The fact that the IO/M pin is set or not has nothing to do with the transfer speed -- it all uses the same bus. The CPU must simply wait however long it takes for the device to respond. The simple fact that a device responds to port address(es) versus memory address(es) has nothing to do with the speed of transfer. However, there is a little more "flexibility" in the CPU instructions that can be used with MMIO vs. PMIO. With PMIO the only way you can transfer data is with IN, INS, OUT, or OUTS. With MMIO you can use (at least theoretically) all the CPU instructions that have memory pointers (which a lot of them do). But some devices have limitations on that, too (e.g., with some devices you can only modify an entire Word or DWord at once and not individual Bits or Bytes, and on some devices reading data from a port can actually change some configuration parameter and you don't even need to write to it). In theory, MMIO can make the code a little faster (or at least more efficient) if it is done properly, but if done improperly can actually make things slower. ___ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel
Re: [Freedos-devel] Slowdown-Units ratings and a CPU-bound depacker be nchmark
On at 2023-04-05 03:32 +, Bret Johnson wrote: The article is found at https://pushbx.org/ecm/dokuwiki /doku.php?id=blog:pushbx:2023:0321_cpu_performance_comparison I mostly agree with you and your article, but: fine that you agree, but at most 50% of the article is even close to 'right'. You're the one who said, "I mostly agree with you and your article, but:", not me. Conclusion CPU-bound benchmarks are much faster on a modern machine than they are on older ones. The frequency increase does not actually suffice to explain the speedup. Some things, like doing I/O, were not sped up nearly as much however. I tried posting a much longer response to this, but it was apparently rejected by the moderators. Here's a shorter one. As I already mentioned in the other reply to this thread, feel free to send me more specific replies to that article. I/O has also vastly speedup (we have SSD speeds of up to 6 GB/sec). Just not by doing IN/OUT, but by using memory mapped PCI devices. I think you're confusing two different things -- MMIO and DMA/Bus- Mastering. He is NOT. I wasn't talking about ecm being confused, I was talking about you. AFAIK, ecm never tested either MMIO or bus-mastering so never said anything about them. Yes, the only tests I did involved running Slowdown with and without the one port I/O instruction patched out in the waste loop. However, Tom is correct that this specific *port* I/O access to that particular port is not representative of all possible ways of doing I/O. Regards, ecm ___ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel
Re: [Freedos-devel] Slowdown-Units ratings and a CPU-bound depacker be nchmark
On at 2023-04-04 21:09 +0200, tom ehlert wrote: Dear Bret, The article is found at https://pushbx.org/ecm/dokuwiki/doku.php?id=blog:pushbx:2023:0321_cpu_performance_comparison I mostly agree with you and your article, but: fine that you agree, but at most 50% of the article is even close to 'right'. Conclusion CPU-bound benchmarks are much faster on a modern machine than they are on older ones. The frequency increase does not actually suffice to explain the speedup. Some things, like doing I/O, were not sped up nearly as much however. I tried posting a much longer response to this, but it was apparently rejected by the moderators. Here's a shorter one. I don't think that the mailing lists are moderated that way? Anyway, Bret, you can send any additional comments to me per email or as a comment on the blog. I/O has also vastly speedup (we have SSD speeds of up to 6 GB/sec). Just not by doing IN/OUT, but by using memory mapped PCI devices. I think you're confusing two different things -- MMIO and DMA/Bus-Mastering. He is NOT. Who's "he"? In case this was meant to refer to me, it is wrong because I'm not a "he". Whether I/O is PMIO or MMIO is pretty much irrelevant to the speed. For example, I/O port 201h (the analog joystick) and I/O port 92h (which controls A20 on some computers) are both VERY slow and would not be any faster if they were MMIO instead of PMIO. this is plain bullshit. The speed depends on the I/O device, not the type of I/O mapping. which is nonsense. Plus, I/O _can't_ be cached, whether PMIO or MMIO, so the cache(s) are irrelevant to I/O. yes. I/O device data can't be cached. you are such a clever person to discover this fact. WOW. SSD speeds are fast because they use bus-mastering, not because they use MMIO. The I/O ports are used to "control" the device, but the data from the SSD is transferred in and out of RAM using bus-mastering (which is fast because it doesn't use the CPU at all). I understand that you don't have the faintest clue how modern PCI devices work. Just go ahead with undertaining us ... Tom Regards, ecm ___ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel
Re: [Freedos-devel] Slowdown-Units ratings and a CPU-bound depacker be nchmark
>>> The article is found at >>> https://pushbx.org/ecm/dokuwiki >>> /doku.php?id=blog:pushbx:2023:0321_cpu_performance_comparison >> >> I mostly agree with you and your article, but: > > fine that you agree, but at most 50% of the article is even close to > 'right'. You're the one who said, "I mostly agree with you and your article, but:", not me. >>> Conclusion >>> >>> CPU-bound benchmarks are much faster on a modern machine than they >>> are on older ones. >>> The frequency increase does not actually suffice to explain the >>> speedup. >>> Some things, like doing I/O, were not sped up nearly as much >>> however. > I tried posting a much longer response to this, but it was > apparently rejected by the moderators. Here's a shorter one. >> I/O has also vastly speedup (we have SSD speeds of up to 6 GB/sec). >> Just not by doing IN/OUT, but by using memory mapped PCI devices. >> I think you're confusing two different things -- MMIO and DMA/Bus- >> Mastering. > He is NOT. I wasn't talking about ecm being confused, I was talking about you. AFAIK, ecm never tested either MMIO or bus-mastering so never said anything about them. >> Whether I/O is PMIO or MMIO is pretty much irrelevant to the speed. >> For example, I/O port 201h (the analog joystick) and I/O port 92h >> (which controls A20 on some computers) are both VERY slow and would >> not be any faster if they were MMIO instead of PMIO. > this is plain bullshit. Care to explain in more detail? >> The speed depends on the I/O device, not the type of I/O mapping. > which is nonsense. Care to explain in more detail? >> Plus, I/O _can't_ be cached, whether PMIO or MMIO, so the cache(s) >> are irrelevant to I/O. > yes. I/O device data can't be cached. you are such a clever person to > discover this fact. WOW. Glad you agree that I/O can't be cached. >> SSD speeds are fast because they use bus-mastering, not because >> they use MMIO. The I/O ports are used to "control" the device, but >> the data from the SSD is transferred in and out of RAM using >> bus-mastering (which is fast because it doesn't use the CPU at all). > I understand that you don't have the faintest clue how modern PCI > devices work. Just go ahead with undertaining us ... Please enlighten me. ___ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel
Re: [Freedos-devel] Slowdown-Units ratings and a CPU-bound depacker be nchmark
Dear Bret, > The article is found at >> https://pushbx.org/ecm/dokuwiki/doku.php?id=blog:pushbx:2023:0321_cpu_performance_comparison > I mostly agree with you and your article, but: fine that you agree, but at most 50% of the article is even close to 'right'. >>> Conclusion >>> >>> CPU-bound benchmarks are much faster on a modern machine than they >>> are on older ones. >>> The frequency increase does not actually suffice to explain the >>> speedup. >>> Some things, like doing I/O, were not sped up nearly as much >>> however. > I tried posting a much longer response to this, but it was > apparently rejected by the moderators. Here's a shorter one. >> I/O has also vastly speedup (we have SSD speeds of up to 6 GB/sec). >> Just not by doing IN/OUT, but by using memory mapped PCI devices. > I think you're confusing two different things -- MMIO and DMA/Bus-Mastering. He is NOT. > Whether I/O is PMIO or MMIO is pretty much irrelevant to the speed. > For example, I/O port 201h (the analog joystick) and I/O port 92h > (which controls A20 on some computers) are both VERY slow and would > not be any faster if they were MMIO instead of PMIO. this is plain bullshit. > The speed depends on the I/O device, not the type of I/O mapping. which is nonsense. > Plus, I/O > _can't_ be cached, whether PMIO or MMIO, so the cache(s) are irrelevant to > I/O. yes. I/O device data can't be cached. you are such a clever person to discover this fact. WOW. > SSD speeds are fast because they use bus-mastering, not because > they use MMIO. The I/O ports are used to "control" the device, but > the data from the SSD is transferred in and out of RAM using > bus-mastering (which is fast because it doesn't use the CPU at all). I understand that you don't have the faintest clue how modern PCI devices work. Just go ahead with undertaining us ... Tom ___ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel
Re: [Freedos-devel] Slowdown-Units ratings and a CPU-bound depacker be nchmark
Hi, am Dienstag, 21. März 2023 um 23:26 schrieben Sie: > Hello! The article is found at > https://pushbx.org/ecm/dokuwiki/doku.php?id=blog:pushbx:2023:0321_cpu_performance_comparison I mostly agree with you and your article, but: >> Conclusion >> >> CPU-bound benchmarks are much faster on a modern machine than they >> are on older ones. >> The frequency increase does not actually suffice to explain the >> speedup. >> Some things, like doing I/O, were not sped up nearly as much >> however. I tried posting a much longer response to this, but it was apparently rejected by the moderators. Here's a shorter one. > I/O has also vastly speedup (we have SSD speeds of up to 6 GB/sec). > Just not by doing IN/OUT, but by using memory mapped PCI devices. I think you're confusing two different things -- MMIO and DMA/Bus-Mastering. Whether I/O is PMIO or MMIO is pretty much irrelevant to the speed. For example, I/O port 201h (the analog joystick) and I/O port 92h (which controls A20 on some computers) are both VERY slow and would not be any faster if they were MMIO instead of PMIO. The speed depends on the I/O device, not the type of I/O mapping. Plus, I/O _can't_ be cached, whether PMIO or MMIO, so the cache(s) are irrelevant to I/O. SSD speeds are fast because they use bus-mastering, not because they use MMIO. The I/O ports are used to "control" the device, but the data from the SSD is transferred in and out of RAM using bus-mastering (which is fast because it doesn't use the CPU at all). ___ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel