Hi, thanks for your email and taking interest in parsing performance! We're generally interested in accepting contributions in this area, as long as they're not too tedious to review. Feel free to send them to [email protected].
The categorical parsing benchmark is CodeLoad, part of Octane ( https://github.com/chromium/octane ). But in this case, using microbenchmarks for proving the performance improvement is also legit. Maybe take a realistic JS file as a benchmark, so that you're not artificially skewing the keyword frequencies or so. You can add the microbenchmark & results as a comment in the code review. Or dump them in a Google Doc. Thanks again and looking forward to hearing whether your ideas worked out! On Sat, Mar 11, 2023 at 5:04 AM Yoni Feigelson <[email protected]> wrote: > I came across Blazingly fast parsing, part 1: optimizing the scanner · V8 > <https://v8.dev/blog/scanner#keywords> > > Looking at the code > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h>, > 2 things came to mind: > 1. Doubling the size of the hash table produced by gperf would increase > the binary size negligibly, but increase performance non-negligibly? > (higher probability for non-keyword to hit an empty entry) > > 2.The code for GetToken: > > inline Token > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207> > ::Value > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211> > PerfectKeywordHash > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=61> > ::GetToken > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=GetToken&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23RdAVTFSv8ZOUxaLqYUtGtLolSEaK9fIzoghC_qNc_9U> > (const char* str > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=str&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%235aWwJNw-VMwN4g26PrvukddNMNMhe1tkw08mwi2icG0>, > int len > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=len&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23TXtQ_Rf7pbtkIpnmu4qVJ46bfd9P1HWjChcOZdztU0o>) > { > if (base::IsInRange > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/bounds.h;drc=8987a17e4855762850bb276ec150198aab415775;l=17> > (len > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>, > MIN_WORD_LENGTH > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=53>, > MAX_WORD_LENGTH > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=54>)) > { > unsigned int key > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=158?gsn=key&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23bE4pn7GRnvYNkdNJ7VdVUX1cud2fyxdH9mdhWaRqocs> > = Hash > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=69> > (str > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>, > len > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>) > & 0x3f; > DCHECK_LT > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=160?gsn=DCHECK_LT&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019095&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019619> > (key > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>, > arraysize > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/macros.h;drc=8987a17e4855762850bb276ec150198aab415775;l=36> > (kPerfectKeywordLengthTable > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=85>)); > > DCHECK_LT > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=161?gsn=DCHECK_LT&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019095&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019619> > (key > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>, > arraysize > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/macros.h;drc=8987a17e4855762850bb276ec150198aab415775;l=36> > (kPerfectKeywordHashTable > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90>)); > > if (len > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156> > == kPerfectKeywordLengthTable > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=85> > [key > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>]) > { > const char* s > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=163?gsn=s&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23tqfFAXiE8oj3rv7FCx8SS2BtlBivb6sfb0-HXdNqvL0> > = kPerfectKeywordHashTable > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90> > [key > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158> > ].name > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=48>; > > while (*s > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=163> > != 0) { > if (*s > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=163>++ > != *str > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>++) > return Token > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207> > ::IDENTIFIER > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211>; > > } > return kPerfectKeywordHashTable > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90> > [key > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158> > ].value > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=49>; > > } > } > return Token > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207> > ::IDENTIFIER > <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211>; > > } > > I wonder: > a. Is the manual char-by-char comparison possibly suboptimal to what > `memcmp` would do? (Might be target machine dependent, but support for some > level of SSE/SSE4.2 SIMDification?) > > b. The length equality branch prior to string comparison might be a net > negative? I guess it heavily depends on the ratio of keywords vs > non-keywords in the scanned code, but it might be better to just do the > string equality check first. > > For a crude example of the version I have in mind: > Token GetToken2(const char* str, int len) { > if (len - MIN_WORD_LENGTH <= MAX_WORD_LENGTH-MIN_WORD_LENGTH) { > unsigned int key = Hash(str, len) & 0x3f; > > unsigned int key_len = kPerfectKeywordLengthTable[key]; > // crude branchless min, real would need casts > unsigned int min_len = key_len + ((len-key_len) & (len-key_len)>>31); > const char* s = kPerfectKeywordHashTable[key].name; > if (memcmp (str, s, min_len) == 0) { > if (len == key_len) { > return kPerfectKeywordHashTable[key].value; > } > } > } > return Token::IDENTIFIER; > } > > > I'd like to ask: > 1. Are there existing benchmarks that can be run? > 2. I assume a perf improvement (if any) won't be drastic. Is it worth even > trying out? > Put another way - if the code stays simple, with a slight perf gain - > would that CL be accepted? > > Cheers, > Yoni > > -- > -- > v8-dev mailing list > [email protected] > http://groups.google.com/group/v8-dev > --- > You received this message because you are subscribed to the Google Groups > "v8-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/v8-dev/cee7bc26-50cd-49bf-8e2b-f29b01ac5adfn%40googlegroups.com > <https://groups.google.com/d/msgid/v8-dev/cee7bc26-50cd-49bf-8e2b-f29b01ac5adfn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- Google Germany GmbH Erika-Mann-Straße 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian. Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person. -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/CAED6dUDMePpv42oEFbV3X%2BR7JyyUzbY3wEb6ns7Xq%2BLyTRuJng%40mail.gmail.com.
