Hi swift-dev, Joe Shajrawi is adding a categorization framework to our in-tree benchmark suite (swift/benchmark). We're going to have a initial set of tags as defined below (comments welcome). These are free-form tags, but they will fall into a natural hierarchy. The purpose of tagging a benchmarks is:
- Help document a benchmark's intent and search for relevant benchmarks when changing the stdlib/runtime/optimizer. - Quickly run a subset of benchmarks most relevant to a particular stdlib/runtime/compiler change. - Document performance coverage. Any API, runtime call, or pattern considered important for general Swift performance should be explicitly represented among the "validation" suite. - Track the performance of different kinds of benchmarks independently. For example, "regression" benchmarks are only useful for identifying performance regressions. They may not be highly applicable to general Swift performance. "validation" benchmarks are areas that we want to continually improve. A regression on a validation benchmark is potentially more serious than on a regression benchmark. Note that we don't have "unit test" benchmarks. Specific compiler transformations should be verified with lit tests. "Regression" benchmarks are usually just a bit too complicated to rely solely on a lit test. --- Tags --- #validation : These are "micro" benchmarks that test a specific operation or critical path that we know is important to measure. (I considered calling these #coverage, but don't want to confuse them with code coverage efforts). Within #validation we have: #api -> #Array, #String, #Dictionary, #Codable, etc. #sdk #runtime -> #refcount, #metadata, etc. #abstraction #safetychecks #exceptions #bridging #concurrency #stable : additionally tag any validation tests that already have stable, reasonably optimized implementation. #algorithm : These are "micro" benchmarks that test some well-known algorithm in isolation: sorting, searching, hashing, fibonaci, crypto, etc. #miniapplication : These benchmarks are contrived to mimic some subset of application behavior in a way that can be easily measured. They are larger than micro-benchmarks, combining multiple APIs, data structures, or algorithms. This includes small standardized benchmarks, pieces of real applications that have been extracted into a benchmark, important functionality like JSON parsing, etc. #regression : Pretty much everything else. This could be a random piece of code that was attached to a bug report. We want to make sure the optimizer as a whole continues to handle this case, but don't know how applicable it is to general Swift performance relative to the other micro-benchmarks. In particular, these aren't weighted as highly as "validation" benchmarks and likely won't be the subject of future investigation unless they significantly regress. -Andy _______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev