Carl, On Mon, Apr 30, 2007 at 04:31:36PM -0700, Carl Love wrote: > > I have a fundamental design issue with requiring a write to a read only > register to allow you to then read the register. The approach does work > on Power so that is not a problem. First of all it is very counter > intuitive that you should have to write to a read only register so you > can then read from it. Secondly, the fact of the matter is you actually > don't care about the write operation at all. You really just want is > the side effect, i.e. registering the PMD. It seems like there should > be a more natural way to do that. The write function ends up adding the > read only PMD to the PMDs to be reset and accumulated. Note sure of the > value of adding the read only PMD to the list of PMDs to be reset. > I tend to agree with about the current design. I think it is clearly counter-intuitive and I'd like to correct it. The only reason we allow pfm_write_pmds() on read-only PMD is to track their usage, i.e., update the used_pmds bitmasks. This is needed because we only allow pfm_read_pmds() on registers which have been declared used used either via pfm_write_pmds() or bcause they have been declared as being sampled (smpl_pmds) or reset (reset_pmds).
The used_pmds bitmasks is also used on context switch to avoid saving unused registers. Similarly, it is used on context switch in to only restore the pmds we use. Finally, it is also used when detaching the context to save the last HW values into the perfmon context such that any subsequent pfm_read_pmds() will return the last known value. This latter property applies also to read-only registers. On context switch out, we do need to save read-only PMD because any pfm_read_pmds() by another (controlling) process requires the monitored thread to be stopped (i.e., switched out). On context switch in, we do not need to restore read-only registers. if we use your patch and only start tracking read-only PMD registers when they are actually read (assuming they are not passed as smpl_pmds nor reset_pmds), then we have a problem with context switches. Let me give a short example. Assume process A is monitored the single-threaded process B and that register TBR is a read-only PMD. A programs the PMU, attches the context and then starts active monitoring. TBR is not yet read or written. Then at some point, A wants to read TBR from B. It first stops process B, which causes it to be switched out. Only the PMD registers used by B are saved, that does not include TBR. Process B eventually is off the CPU and the pmf_read_pmds() can proceed. It will read from the perfmon context and not touch HW. The returned value will be zero simply because the kernel did not know we were interested in TBR and it did not track it. One workaround for this problem would be to systematically include all read-only registers as part of used_pmds but that would penalize the context switch out code. Another way would be to force the user to issue a pfm_read_pmds() prior to starting monitoring to add the read-only register to used_pmds. > The first thought is to have an explicit call to register the read only > pmd. It would not be necessary to register it as a pmd to be restored. We already have a lot of sytem calls. It would be hard to justify adding one more to register read-only PMDs. > You may want it as an accumulated value on. For Power, it is a 64 bit > reg already so it will never overflow and need accumulating. But that > may not be the case for all architectures. Obviously, that means yet > another system call. Note that the kernel does not assume anything about read-only registers, they could be counters or mostly likely something else. There cannot be 64-bit emulation on read-only register simply because they cannot be modified. > The second option is to register reads to read only pmds when the PMD is > read. Consider the time base register (TBR). Currently, you must write > the TBR, the write is ignored so nothing happens to the TBR. Then read > the TBR to get the initial value. Run the app and then read the TBR > again. If we were to register the TBR on the initial read, we could > eliminate the write which doesn't make sense anyway. In the end, the > user would just do the initial read and the final read and subtract them > to get the elapsed time. To me this make more sense. Additionally, > only the bit masks that are really needed would have their bits set, > i.e. used_pmds, accumulated pmds. You would not include it in the > restore pmd mask. > The double-read is was I suggest above but there is something I don't like about this approach. The optimization done by the implementation is exposed through the interface and that is bad. The interface definition must remain disconnected from the implementation and especially from fancy optimization schemes, e.g., lazy save. What do you think? -- -Stephane _______________________________________________ perfmon mailing list [email protected] http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
