Hi, thanks for your email and taking interest in parsing performance!

We're generally interested in accepting contributions in this area, as long
as they're not too tedious to review. Feel free to send them to
[email protected].

The categorical parsing benchmark is CodeLoad, part of Octane (
https://github.com/chromium/octane ). But in this case, using
microbenchmarks for proving the performance improvement is also legit.
Maybe take a realistic JS file as a benchmark, so that you're not
artificially skewing the keyword frequencies or so. You can add the
microbenchmark & results as a comment in the code review. Or dump them in a
Google Doc.

Thanks again and looking forward to hearing whether your ideas worked out!


On Sat, Mar 11, 2023 at 5:04 AM Yoni Feigelson <[email protected]> wrote:

> I came across Blazingly fast parsing, part 1: optimizing the scanner · V8
> <https://v8.dev/blog/scanner#keywords>
>
> Looking at the code
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h>,
> 2 things came to mind:
> 1. Doubling the size of the hash table produced by gperf would increase
> the binary size negligibly, but increase performance non-negligibly?
> (higher probability for non-keyword to hit an empty entry)
>
> 2.The code for GetToken:
>
> inline Token
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207>
> ::Value
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211>
> PerfectKeywordHash
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=61>
> ::GetToken
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=GetToken&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23RdAVTFSv8ZOUxaLqYUtGtLolSEaK9fIzoghC_qNc_9U>
> (const char* str
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=str&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%235aWwJNw-VMwN4g26PrvukddNMNMhe1tkw08mwi2icG0>,
> int len
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=156?gsn=len&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23TXtQ_Rf7pbtkIpnmu4qVJ46bfd9P1HWjChcOZdztU0o>)
> {
> if (base::IsInRange
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/bounds.h;drc=8987a17e4855762850bb276ec150198aab415775;l=17>
> (len
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>,
> MIN_WORD_LENGTH
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=53>,
> MAX_WORD_LENGTH
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=54>))
> {
> unsigned int key
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=158?gsn=key&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23bE4pn7GRnvYNkdNJ7VdVUX1cud2fyxdH9mdhWaRqocs>
> = Hash
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=69>
> (str
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>,
> len
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>)
> & 0x3f;
> DCHECK_LT
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=160?gsn=DCHECK_LT&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019095&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019619>
> (key
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>,
> arraysize
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/macros.h;drc=8987a17e4855762850bb276ec150198aab415775;l=36>
> (kPerfectKeywordLengthTable
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=85>));
>
> DCHECK_LT
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=161?gsn=DCHECK_LT&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019095&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fbase%2Flogging.h%23DCHECK_LT%2523m%254019619>
> (key
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>,
> arraysize
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/base/macros.h;drc=8987a17e4855762850bb276ec150198aab415775;l=36>
> (kPerfectKeywordHashTable
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90>));
>
> if (len
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>
> == kPerfectKeywordLengthTable
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=85>
> [key
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>])
> {
> const char* s
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;bpv=1;bpt=1;l=163?gsn=s&gs=kythe%3A%2F%2Fchromium.googlesource.com%2Fchromium%2Fsrc%3Flang%3Dc%252B%252B%3Fpath%3Dv8%2Fsrc%2Fparsing%2Fkeywords-gen.h%23tqfFAXiE8oj3rv7FCx8SS2BtlBivb6sfb0-HXdNqvL0>
> = kPerfectKeywordHashTable
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90>
> [key
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>
> ].name
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=48>;
>
> while (*s
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=163>
> != 0) {
> if (*s
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=163>++
> != *str
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=156>++)
> return Token
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207>
> ::IDENTIFIER
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211>;
>
> }
> return kPerfectKeywordHashTable
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=90>
> [key
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=158>
> ].value
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/keywords-gen.h;drc=8987a17e4855762850bb276ec150198aab415775;l=49>;
>
> }
> }
> return Token
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=207>
> ::IDENTIFIER
> <https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/token.h;drc=8987a17e4855762850bb276ec150198aab415775;l=211>;
>
> }
>
> I wonder:
> a. Is the manual char-by-char comparison possibly suboptimal to what
> `memcmp` would do? (Might be target machine dependent, but support for some
> level of SSE/SSE4.2 SIMDification?)
>
> b. The length equality branch prior to string comparison might be a net
> negative? I guess it heavily depends on the ratio of keywords vs
> non-keywords in the scanned code, but it might be better to just do the
> string equality check first.
>
> For a crude example of the version I have in mind:
> Token GetToken2(const char* str, int len) {
>   if (len - MIN_WORD_LENGTH <= MAX_WORD_LENGTH-MIN_WORD_LENGTH) {
>     unsigned int key = Hash(str, len) & 0x3f;
>
>     unsigned int key_len = kPerfectKeywordLengthTable[key];
>     // crude branchless min, real would need casts
>     unsigned int min_len = key_len + ((len-key_len) & (len-key_len)>>31);
>     const char* s = kPerfectKeywordHashTable[key].name;
>     if (memcmp (str, s, min_len) == 0) {
>         if (len == key_len) {
>             return kPerfectKeywordHashTable[key].value;
>         }
>     }
>   }
>   return Token::IDENTIFIER;
> }
>
>
> I'd like to ask:
> 1. Are there existing benchmarks that can be run?
> 2. I assume a perf improvement (if any) won't be drastic. Is it worth even
> trying out?
> Put another way - if the code stays simple, with a slight perf gain -
> would that CL be accepted?
>
> Cheers,
> Yoni
>
> --
> --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "v8-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/v8-dev/cee7bc26-50cd-49bf-8e2b-f29b01ac5adfn%40googlegroups.com
> <https://groups.google.com/d/msgid/v8-dev/cee7bc26-50cd-49bf-8e2b-f29b01ac5adfn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 


Google Germany GmbH

Erika-Mann-Straße 33

80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian.

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten
haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter,
löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen,
dass die E-Mail an die falsche Person gesendet wurde.



This e-mail is confidential. If you received this communication by mistake,
please don't forward it to anyone else, please erase all copies and
attachments, and please let me know that it has gone to the wrong person.

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/CAED6dUDMePpv42oEFbV3X%2BR7JyyUzbY3wEb6ns7Xq%2BLyTRuJng%40mail.gmail.com.

Reply via email to