On Sat, 12 Mar 2016 at 10:16 Serhiy Storchaka <[email protected]> wrote:
> On 07.03.16 19:19, Brett Cannon wrote: > > Are you thinking about turning all of this into a benchmark for the > > benchmark suite? > > This was my purpose. I first had written a benchmark for the benchmark > suite, then I became interested in more detailed results and a > comparison with alternative engines. > > There are several questions about a benchmark for the benchmark suite. > > 1. Input data is public 20MB text (8MB in ZIP file). Should we download > it every time (may be with caching) or add it to the repository? > Add it the repository probably (`du -h` on my checkout says the total disk space used is 280 MB already). I would like to look into what it would take to use pip to install dependencies so that we don't have such a large checkout, at which point we could talk about downloading it. But as of right now we keep it all self-contained to control for the inputs to the benchmarks. > > 2. One iteration of all searches on full text takes 29 seconds on my > computer. Isn't this too long? In any case I want first optimize some > bottlenecks in the re module. > I don't think we have established a "too long" time. We do have some benchmarks like spectral_norm that don't run unless you use rigorous mode and this could be one of them. > > 3. Do we need one benchmark that gives an accumulated time of all > searches, or separate microbenchmarks for every pattern? > I don't care either way. Obviously it depends on whether you want to measure overall re perf and have people aim to improve that or let people target specific workload types. > > 4. Would be nice to use the same benchmark for comparing different > regular expression. This requires changing perf.py. May be we could use > the same interface to compare ElementTree with lxml and json with > simplejson. > So there's already an approach to do this when you execute the benchmark scripts directly through command-line flags. You do lose perf.py's calculation benefits, though. I personally have no issue if you or anyone else came up with a way to pass in benchmark-specific flags (i.e., our own version of -X). > > 5. Patterns are ASCII-only and the text is mostly ASCII. Would be nice > to add non-ASCII pattern and non-ASCII text. But this will increase run > time. > I think that's fine. Better that the benchmark measure something useful than worry about whether anyone will want to run it in fast mode.
_______________________________________________ Speed mailing list [email protected] https://mail.python.org/mailman/listinfo/speed
