[ https://issues.apache.org/jira/browse/LUCENE-9986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355684#comment-17355684 ]
Michael Sokolov commented on LUCENE-9986: ----------------------------------------- [This SO post|https://stackoverflow.com/questions/15819919/where-can-i-find-unit-tests-for-regular-expressions-in-multiple-languages] links to many test suites in various open source projects. Not sure which would be best/best licensed for copying here? > Create a simple "real world" regexp benchmark > --------------------------------------------- > > Key: LUCENE-9986 > URL: https://issues.apache.org/jira/browse/LUCENE-9986 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Priority: Major > > For issues like LUCENE-9983, where we are struggling to decide which > low-level optimizations to make for our (complicated!) {{determinize}} > method, it would really help to have a large, real-world corpus of regexps to > evaluate performance metrics of our automata operations, like CPU and HEAP > required to parse the regexp and determinize. > Does anyone know of such an existing, hopefully compatibly licensed, corpus? > Probably we would add these benchmarks to {{luceneutil}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org