Thanks for the enthusiastic response Marja!

I have a bunch of things I want to try out.
If there's a definite improvement to be made, I'll submit a CR.
I'll share my learnings (if any) regardless.

I'm aware this work by the scanner/lexer isn't a significant perf portion
for parsing a script, but I've fallen down a rabbit hole of performing
static set membership checks as fast as possible.

On Sat, Mar 11, 2023 at 5:42 PM Marja Hölttä <[email protected]> wrote:

> Hi, thanks for your email and taking interest in parsing performance!
>
> We're generally interested in accepting contributions in this area, as
> long as they're not too tedious to review. Feel free to send them to
> [email protected].
>
> The categorical parsing benchmark is CodeLoad, part of Octane (
> https://github.com/chromium/octane ). But in this case, using
> microbenchmarks for proving the performance improvement is also legit.
> Maybe take a realistic JS file as a benchmark, so that you're not
> artificially skewing the keyword frequencies or so. You can add the
> microbenchmark & results as a comment in the code review. Or dump them in a
> Google Doc.
>
> Thanks again and looking forward to hearing whether your ideas worked out!
>
>
> On Sat, Mar 11, 2023 at 5:04 AM Yoni Feigelson <[email protected]>
> wrote:
>
>> I came across Blazingly fast parsing, part 1: optimizing the scanner · V8
>> <https://v8.dev/blog/scanner#keywords>
>>
>> Looking at the code
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h>,
>> 2 things came to mind:
>> 1. Doubling the size of the hash table produced by gperf would increase
>> the binary size negligibly, but increase performance non-negligibly?
>> (higher probability for non-keyword to hit an empty entry)
>>
>> 2.The code for GetToken:
>>
>> inline Token
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207>
>> ::Value
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211>
>> PerfectKeywordHash
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=61>
>> ::GetToken
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=GetToken&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23RdAVTFSv8ZOUxaLqYUtGtLolSEaK9fIzoghC_qNc_9U>
>> (const char* str
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=str&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%235aWwJNw-VMwN4g26PrvukddNMNMhe1tkw08mwi2icG0>,
>> int len
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=len&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23TXtQ_Rf7pbtkIpnmu4qVJ46bfd9P1HWjChcOZdztU0o>)
>> {
>> if (base::IsInRange
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/bounds.h;drc=8987a17e4855762850bb276ec150198aab415775;l=17>
>> (len
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>,
>> MIN_WORD_LENGTH
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=53>,
>> MAX_WORD_LENGTH
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=54>))
>> {
>> unsigned int key
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=158?gsn=key&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23bE4pn7GRnvYNkdNJ7VdVUX1cud2fyxdH9mdhWaRqocs>
>> = Hash
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=69>
>> (str
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>,
>> len
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>)
>> & 0x3f;
>> DCHECK_LT
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=160?gsn=DCHECK_LT&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019095&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019619>
>> (key
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>,
>> arraysize
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/macros.h;drc=8987a17e4855762850bb276ec150198aab415775;l=36>
>> (kPerfectKeywordLengthTable
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=85>));
>>
>> DCHECK_LT
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=161?gsn=DCHECK_LT&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019095&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019619>
>> (key
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>,
>> arraysize
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/macros.h;drc=8987a17e4855762850bb276ec150198aab415775;l=36>
>> (kPerfectKeywordHashTable
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90>));
>>
>> if (len
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>
>> == kPerfectKeywordLengthTable
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=85>
>> [key
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>])
>> {
>> const char* s
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=163?gsn=s&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23tqfFAXiE8oj3rv7FCx8SS2BtlBivb6sfb0-HXdNqvL0>
>> = kPerfectKeywordHashTable
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90>
>> [key
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>
>> ].name
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=48>;
>>
>> while (*s
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=163>
>> != 0) {
>> if (*s
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=163>++
>> != *str
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>++)
>> return Token
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207>
>> ::IDENTIFIER
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211>;
>>
>> }
>> return kPerfectKeywordHashTable
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90>
>> [key
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>
>> ].value
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=49>;
>>
>> }
>> }
>> return Token
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207>
>> ::IDENTIFIER
>> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211>;
>>
>> }
>>
>> I wonder:
>> a. Is the manual char-by-char comparison possibly suboptimal to what
>> `memcmp` would do? (Might be target machine dependent, but support for some
>> level of SSE/SSE4.2 SIMDification?)
>>
>> b. The length equality branch prior to string comparison might be a net
>> negative? I guess it heavily depends on the ratio of keywords vs
>> non-keywords in the scanned code, but it might be better to just do the
>> string equality check first.
>>
>> For a crude example of the version I have in mind:
>> Token GetToken2(const char* str, int len) {
>>   if (len - MIN_WORD_LENGTH <= MAX_WORD_LENGTH-MIN_WORD_LENGTH) {
>>     unsigned int key = Hash(str, len) & 0x3f;
>>
>>     unsigned int key_len = kPerfectKeywordLengthTable[key];
>>     // crude branchless min, real would need casts
>>     unsigned int min_len = key_len + ((len-key_len) & (len-key_len)>>31);
>>     const char* s = kPerfectKeywordHashTable[key].name;
>>     if (memcmp (str, s, min_len) == 0) {
>>         if (len == key_len) {
>>             return kPerfectKeywordHashTable[key].value;
>>         }
>>     }
>>   }
>>   return Token::IDENTIFIER;
>> }
>>
>>
>> I'd like to ask:
>> 1. Are there existing benchmarks that can be run?
>> 2. I assume a perf improvement (if any) won't be drastic. Is it worth
>> even trying out?
>> Put another way - if the code stays simple, with a slight perf gain -
>> would that CL be accepted?
>>
>> Cheers,
>> Yoni
>>
>> --
>> --
>> v8-dev mailing list
>> [email protected]
>> http://groups.google.com/group/v8-dev
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "v8-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/v8-dev/cee7bc26-50cd-49bf-8e2b-f29b01ac5adfn%40googlegroups.com
>> <https://groups.google.com/d/msgid/v8-dev/cee7bc26-50cd-49bf-8e2b-f29b01ac5adfn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
>
>
> Google Germany GmbH
>
> Erika-Mann-Straße 33
>
> 80636 München
>
> Geschäftsführer: Paul Manicle, Liana Sebastian.
>
> Registergericht und -nummer: Hamburg, HRB 86891
>
> Sitz der Gesellschaft: Hamburg
>
> Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten
> haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter,
> löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen,
> dass die E-Mail an die falsche Person gesendet wurde.
>
>
>
> This e-mail is confidential. If you received this communication by
> mistake, please don't forward it to anyone else, please erase all copies
> and attachments, and please let me know that it has gone to the wrong
> person.
>
> --
> --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "v8-dev" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/v8-dev/kCEgKwB6gNs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/v8-dev/CAED6dUDMePpv42oEFbV3X%2BR7JyyUzbY3wEb6ns7Xq%2BLyTRuJng%40mail.gmail.com
> <https://groups.google.com/d/msgid/v8-dev/CAED6dUDMePpv42oEFbV3X%2BR7JyyUzbY3wEb6ns7Xq%2BLyTRuJng%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/CABLEdPoqF0s2MW_OGdEDv0FbubpyFoszyWjibReBnNwaAxjfWA%40mail.gmail.com.

Reply via email to