Thanks for the enthusiastic response Marja! I have a bunch of things I want to try out. If there's a definite improvement to be made, I'll submit a CR. I'll share my learnings (if any) regardless.
I'm aware this work by the scanner/lexer isn't a significant perf portion for parsing a script, but I've fallen down a rabbit hole of performing static set membership checks as fast as possible. On Sat, Mar 11, 2023 at 5:42 PM Marja Hölttä <[email protected]> wrote: > Hi, thanks for your email and taking interest in parsing performance! > > We're generally interested in accepting contributions in this area, as > long as they're not too tedious to review. Feel free to send them to > [email protected]. > > The categorical parsing benchmark is CodeLoad, part of Octane ( > https://github.com/chromium/octane ). But in this case, using > microbenchmarks for proving the performance improvement is also legit. > Maybe take a realistic JS file as a benchmark, so that you're not > artificially skewing the keyword frequencies or so. You can add the > microbenchmark & results as a comment in the code review. Or dump them in a > Google Doc. > > Thanks again and looking forward to hearing whether your ideas worked out! > > > On Sat, Mar 11, 2023 at 5:04 AM Yoni Feigelson <[email protected]> > wrote: > >> I came across Blazingly fast parsing, part 1: optimizing the scanner · V8 >> <https://v8.dev/blog/scanner#keywords> >> >> Looking at the code >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h>, >> 2 things came to mind: >> 1. Doubling the size of the hash table produced by gperf would increase >> the binary size negligibly, but increase performance non-negligibly? >> (higher probability for non-keyword to hit an empty entry) >> >> 2.The code for GetToken: >> >> inline Token >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207> >> ::Value >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211> >> PerfectKeywordHash >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=61> >> ::GetToken >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=GetToken&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23RdAVTFSv8ZOUxaLqYUtGtLolSEaK9fIzoghC_qNc_9U> >> (const char* str >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=str&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%235aWwJNw-VMwN4g26PrvukddNMNMhe1tkw08mwi2icG0>, >> int len >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=len&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23TXtQ_Rf7pbtkIpnmu4qVJ46bfd9P1HWjChcOZdztU0o>) >> { >> if (base::IsInRange >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/bounds.h;drc=8987a17e4855762850bb276ec150198aab415775;l=17> >> (len >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>, >> MIN_WORD_LENGTH >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=53>, >> MAX_WORD_LENGTH >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=54>)) >> { >> unsigned int key >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=158?gsn=key&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23bE4pn7GRnvYNkdNJ7VdVUX1cud2fyxdH9mdhWaRqocs> >> = Hash >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=69> >> (str >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>, >> len >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>) >> & 0x3f; >> DCHECK_LT >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=160?gsn=DCHECK_LT&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019095&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019619> >> (key >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>, >> arraysize >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/macros.h;drc=8987a17e4855762850bb276ec150198aab415775;l=36> >> (kPerfectKeywordLengthTable >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=85>)); >> >> DCHECK_LT >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=161?gsn=DCHECK_LT&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019095&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019619> >> (key >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>, >> arraysize >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/macros.h;drc=8987a17e4855762850bb276ec150198aab415775;l=36> >> (kPerfectKeywordHashTable >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90>)); >> >> if (len >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156> >> == kPerfectKeywordLengthTable >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=85> >> [key >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>]) >> { >> const char* s >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=163?gsn=s&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23tqfFAXiE8oj3rv7FCx8SS2BtlBivb6sfb0-HXdNqvL0> >> = kPerfectKeywordHashTable >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90> >> [key >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158> >> ].name >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=48>; >> >> while (*s >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=163> >> != 0) { >> if (*s >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=163>++ >> != *str >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>++) >> return Token >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207> >> ::IDENTIFIER >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211>; >> >> } >> return kPerfectKeywordHashTable >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90> >> [key >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158> >> ].value >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=49>; >> >> } >> } >> return Token >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207> >> ::IDENTIFIER >> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211>; >> >> } >> >> I wonder: >> a. Is the manual char-by-char comparison possibly suboptimal to what >> `memcmp` would do? (Might be target machine dependent, but support for some >> level of SSE/SSE4.2 SIMDification?) >> >> b. The length equality branch prior to string comparison might be a net >> negative? I guess it heavily depends on the ratio of keywords vs >> non-keywords in the scanned code, but it might be better to just do the >> string equality check first. >> >> For a crude example of the version I have in mind: >> Token GetToken2(const char* str, int len) { >> if (len - MIN_WORD_LENGTH <= MAX_WORD_LENGTH-MIN_WORD_LENGTH) { >> unsigned int key = Hash(str, len) & 0x3f; >> >> unsigned int key_len = kPerfectKeywordLengthTable[key]; >> // crude branchless min, real would need casts >> unsigned int min_len = key_len + ((len-key_len) & (len-key_len)>>31); >> const char* s = kPerfectKeywordHashTable[key].name; >> if (memcmp (str, s, min_len) == 0) { >> if (len == key_len) { >> return kPerfectKeywordHashTable[key].value; >> } >> } >> } >> return Token::IDENTIFIER; >> } >> >> >> I'd like to ask: >> 1. Are there existing benchmarks that can be run? >> 2. I assume a perf improvement (if any) won't be drastic. Is it worth >> even trying out? >> Put another way - if the code stays simple, with a slight perf gain - >> would that CL be accepted? >> >> Cheers, >> Yoni >> >> -- >> -- >> v8-dev mailing list >> [email protected] >> http://groups.google.com/group/v8-dev >> --- >> You received this message because you are subscribed to the Google Groups >> "v8-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/v8-dev/cee7bc26-50cd-49bf-8e2b-f29b01ac5adfn%40googlegroups.com >> <https://groups.google.com/d/msgid/v8-dev/cee7bc26-50cd-49bf-8e2b-f29b01ac5adfn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > > Google Germany GmbH > > Erika-Mann-Straße 33 > > 80636 München > > Geschäftsführer: Paul Manicle, Liana Sebastian. > > Registergericht und -nummer: Hamburg, HRB 86891 > > Sitz der Gesellschaft: Hamburg > > Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten > haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, > löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, > dass die E-Mail an die falsche Person gesendet wurde. > > > > This e-mail is confidential. If you received this communication by > mistake, please don't forward it to anyone else, please erase all copies > and attachments, and please let me know that it has gone to the wrong > person. > > -- > -- > v8-dev mailing list > [email protected] > http://groups.google.com/group/v8-dev > --- > You received this message because you are subscribed to a topic in the > Google Groups "v8-dev" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/v8-dev/kCEgKwB6gNs/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/v8-dev/CAED6dUDMePpv42oEFbV3X%2BR7JyyUzbY3wEb6ns7Xq%2BLyTRuJng%40mail.gmail.com > <https://groups.google.com/d/msgid/v8-dev/CAED6dUDMePpv42oEFbV3X%2BR7JyyUzbY3wEb6ns7Xq%2BLyTRuJng%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/CABLEdPoqF0s2MW_OGdEDv0FbubpyFoszyWjibReBnNwaAxjfWA%40mail.gmail.com.
