Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf
On Jan 1, 2017 11:55 AM, "Vladislav Egorov"wrote: 01.01.2017 06:41, Kenneth Graunke пишет: On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote: > >> From: Marek Olšák >> >> This reduces compile times by 4.5% with the Gallium noop driver and >> gl_constants::GLSLOptimizeConservatively == true. >> > Compile times of...what exactly? Do you have any statistics for this > by itself? > > Assuming we add your helper, this patch looks reasonable. > Reviewed-by: Kenneth Graunke > > BTW, I suspect you could get some additional speed up by changing > > parser->output = ralloc_strdup(parser, ""); > > to something like: > > parser->output = ralloc_size(parser, strlen(orig_concatenated_src)); > parser->output[0] = '\0'; > > to try and avoid reallocations. rewrite_tail will realloc just enough > space every time it allocates, which means once you reallocate, you're > going to be calling realloc on every single token. Yuck! > > ralloc/talloc's string libraries were never meant for serious string > processing like the preprocessor does. They're meant for convenience > when constructing debug messages which don't need to be that efficient. > > Perhaps a better approach would be to have the preprocessor do this > itself. Just ralloc_size() output and initialize the null byte. > reralloc to double the size if you need more space. At the end of > preprocessing, reralloc to output_length at the end of free any waste > from doubling. > > I suspect that would be a *lot* more efficient, and is probably what > we should have done in the first place... > I have similar patch (maybe need 1-2 days to clean it up), and I've tested both variants. String in exponentially growing (by +50%) string buffer works better, but not *THAT* much better as I expected. It seems that in the sequence of str = realloc(str, 1001); str = realloc(str, 1002); str = realloc(str, 1003), etc. most of reallocs will be non-moving in both glibc's allocator and jemalloc. For example, jemalloc have size classes that already grow exponentially by 15-25% - ..., 4K, 5K, 6K, 7K, 8K, 10K, 12K, 14K, 16K, 20K, 24K, .., 4M, 5M, ... realloc will just test if the requested size belongs to the same size class and do nothing. Reallocs inside of the same size class will be always non-moving and almost free. Overall avoiding formatted printing (DOUBLE formatted printing, which is entirely avoidable too) gives the single largest boost to the pre-processor. Benchmark on my shader-db (glcpp and shader-db's run smashed together to do only preprocessing). Note that I used old jemalloc from Ubuntu 16.04, which can be important, because jemalloc changed its size class strategy since then. perf stat --repeat 10 master8.91s master+jemalloc 8.60s Marek's patch 5.50s Marek's patch+jemalloc5.03s my string_buffer 4.57s my string_buffer+jemalloc 4.43s my series 3.83s my series+jemalloc3.68s Since you are further than me, let's merge your work instead. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf
On Jan 1, 2017 4:41 AM, "Kenneth Graunke"wrote: On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote: > From: Marek Olšák > > This reduces compile times by 4.5% with the Gallium noop driver and > gl_constants::GLSLOptimizeConservatively == true. Compile times of...what exactly? Do you have any statistics for this by itself? I always use my shader-db with almost 30k shaders and I used the time command. Assuming we add your helper, this patch looks reasonable. Reviewed-by: Kenneth Graunke BTW, I suspect you could get some additional speed up by changing parser->output = ralloc_strdup(parser, ""); to something like: parser->output = ralloc_size(parser, strlen(orig_concatenated_src)); parser->output[0] = '\0'; to try and avoid reallocations. rewrite_tail will realloc just enough space every time it allocates, which means once you reallocate, you're going to be calling realloc on every single token. Yuck! ralloc/talloc's string libraries were never meant for serious string processing like the preprocessor does. They're meant for convenience when constructing debug messages which don't need to be that efficient. Perhaps a better approach would be to have the preprocessor do this itself. Just ralloc_size() output and initialize the null byte. reralloc to double the size if you need more space. At the end of preprocessing, reralloc to output_length at the end of free any waste from doubling. I suspect that would be a *lot* more efficient, and is probably what we should have done in the first place... Sure. However, realloc is a no-op when there is free space after the allocation. It's a useless call, but it doesn't look too bad in a profiler. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf
01.01.2017 06:41, Kenneth Graunke пишет: On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote: From: Marek OlšákThis reduces compile times by 4.5% with the Gallium noop driver and gl_constants::GLSLOptimizeConservatively == true. Compile times of...what exactly? Do you have any statistics for this by itself? Assuming we add your helper, this patch looks reasonable. Reviewed-by: Kenneth Graunke BTW, I suspect you could get some additional speed up by changing parser->output = ralloc_strdup(parser, ""); to something like: parser->output = ralloc_size(parser, strlen(orig_concatenated_src)); parser->output[0] = '\0'; to try and avoid reallocations. rewrite_tail will realloc just enough space every time it allocates, which means once you reallocate, you're going to be calling realloc on every single token. Yuck! ralloc/talloc's string libraries were never meant for serious string processing like the preprocessor does. They're meant for convenience when constructing debug messages which don't need to be that efficient. Perhaps a better approach would be to have the preprocessor do this itself. Just ralloc_size() output and initialize the null byte. reralloc to double the size if you need more space. At the end of preprocessing, reralloc to output_length at the end of free any waste from doubling. I suspect that would be a *lot* more efficient, and is probably what we should have done in the first place... I have similar patch (maybe need 1-2 days to clean it up), and I've tested both variants. String in exponentially growing (by +50%) string buffer works better, but not *THAT* much better as I expected. It seems that in the sequence of str = realloc(str, 1001); str = realloc(str, 1002); str = realloc(str, 1003), etc. most of reallocs will be non-moving in both glibc's allocator and jemalloc. For example, jemalloc have size classes that already grow exponentially by 15-25% - ..., 4K, 5K, 6K, 7K, 8K, 10K, 12K, 14K, 16K, 20K, 24K, .., 4M, 5M, ... realloc will just test if the requested size belongs to the same size class and do nothing. Reallocs inside of the same size class will be always non-moving and almost free. Overall avoiding formatted printing (DOUBLE formatted printing, which is entirely avoidable too) gives the single largest boost to the pre-processor. Benchmark on my shader-db (glcpp and shader-db's run smashed together to do only preprocessing). Note that I used old jemalloc from Ubuntu 16.04, which can be important, because jemalloc changed its size class strategy since then. perf stat --repeat 10 master8.91s master+jemalloc 8.60s Marek's patch 5.50s Marek's patch+jemalloc5.03s my string_buffer 4.57s my string_buffer+jemalloc 4.43s my series 3.83s my series+jemalloc3.68s ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf
On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote: > From: Marek Olšák> > This reduces compile times by 4.5% with the Gallium noop driver and > gl_constants::GLSLOptimizeConservatively == true. Compile times of...what exactly? Do you have any statistics for this by itself? Assuming we add your helper, this patch looks reasonable. Reviewed-by: Kenneth Graunke BTW, I suspect you could get some additional speed up by changing parser->output = ralloc_strdup(parser, ""); to something like: parser->output = ralloc_size(parser, strlen(orig_concatenated_src)); parser->output[0] = '\0'; to try and avoid reallocations. rewrite_tail will realloc just enough space every time it allocates, which means once you reallocate, you're going to be calling realloc on every single token. Yuck! ralloc/talloc's string libraries were never meant for serious string processing like the preprocessor does. They're meant for convenience when constructing debug messages which don't need to be that efficient. Perhaps a better approach would be to have the preprocessor do this itself. Just ralloc_size() output and initialize the null byte. reralloc to double the size if you need more space. At the end of preprocessing, reralloc to output_length at the end of free any waste from doubling. I suspect that would be a *lot* more efficient, and is probably what we should have done in the first place... signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf
From: Marek OlšákThis reduces compile times by 4.5% with the Gallium noop driver and gl_constants::GLSLOptimizeConservatively == true. --- src/compiler/glsl/glcpp/glcpp-parse.y | 39 +++ 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/src/compiler/glsl/glcpp/glcpp-parse.y b/src/compiler/glsl/glcpp/glcpp-parse.y index 63012bc..b84e5ff 100644 --- a/src/compiler/glsl/glcpp/glcpp-parse.y +++ b/src/compiler/glsl/glcpp/glcpp-parse.y @@ -202,21 +202,21 @@ add_builtin_define(glcpp_parser_t *parser, const char *name, int value); input: /* empty */ | input line ; line: control_line | SPACE control_line | text_line { _glcpp_parser_print_expanded_token_list (parser, $1); - ralloc_asprintf_rewrite_tail (>output, >output_length, "\n"); + ralloc_sprint_rewrite_tail(>output, >output_length, "\n", 1); } | expanded_line ; expanded_line: IF_EXPANDED expression NEWLINE { if (parser->is_gles && $2.undefined_macro) glcpp_error(& @1, parser, "undefined macro %s in expression (illegal in GLES)", $2.undefined_macro); _glcpp_parser_skip_stack_push_if (parser, & @1, $2.value); } @@ -252,21 +252,21 @@ define: | FUNC_IDENTIFIER '(' ')' replacement_list NEWLINE { _define_function_macro (parser, & @1, $1, NULL, $4); } | FUNC_IDENTIFIER '(' identifier_list ')' replacement_list NEWLINE { _define_function_macro (parser, & @1, $1, $3, $5); } ; control_line: control_line_success { - ralloc_asprintf_rewrite_tail (>output, >output_length, "\n"); + ralloc_sprint_rewrite_tail(>output, >output_length, "\n", 1); } | control_line_error | HASH_TOKEN LINE pp_tokens NEWLINE { if (parser->skip_stack == NULL || parser->skip_stack->type == SKIP_NO_SKIP) { _glcpp_parser_expand_and_lex_from (parser, LINE_EXPANDED, $3, EXPANSION_MODE_IGNORE_DEFINED); @@ -428,21 +428,22 @@ control_line_success: | HASH_TOKEN VERSION_TOKEN version_constant IDENTIFIER NEWLINE { if (parser->version_set) { glcpp_error(& @1, parser, "#version must appear on the first line"); } _glcpp_parser_handle_version_declaration(parser, $3, $4, true); } | HASH_TOKEN NEWLINE { glcpp_parser_resolve_implicit_version(parser); } | HASH_TOKEN PRAGMA NEWLINE { - ralloc_asprintf_rewrite_tail (>output, >output_length, "#%s", $2); + ralloc_sprint_rewrite_tail(>output, >output_length, "#", 1); + ralloc_sprint_rewrite_tail(>output, >output_length, $2, strlen($2)); } ; control_line_error: HASH_TOKEN ERROR_TOKEN NEWLINE { glcpp_error(& @1, parser, "#%s", $2); } | HASH_TOKEN DEFINE_TOKEN NEWLINE { glcpp_error (& @1, parser, "#define without macro name"); } @@ -1109,71 +1110,73 @@ _token_list_equal_ignoring_space(token_list_t *a, token_list_t *b) node_b = node_b->next; } return 1; } static void _token_print(char **out, size_t *len, token_t *token) { if (token->type < 256) { - ralloc_asprintf_rewrite_tail (out, len, "%c", token->type); + char s[2] = {token->type, 0}; + ralloc_sprint_rewrite_tail(out, len, s, 1); return; } switch (token->type) { case INTEGER: ralloc_asprintf_rewrite_tail (out, len, "%" PRIiMAX, token->value.ival); break; case IDENTIFIER: case INTEGER_STRING: case OTHER: - ralloc_asprintf_rewrite_tail (out, len, "%s", token->value.str); + ralloc_sprint_rewrite_tail(out, len, token->value.str, + strlen(token->value.str)); break; case SPACE: - ralloc_asprintf_rewrite_tail (out, len, " "); + ralloc_sprint_rewrite_tail(out, len, " ", 1); break; case LEFT_SHIFT: - ralloc_asprintf_rewrite_tail (out, len, "<<"); + ralloc_sprint_rewrite_tail(out, len, "<<", 2); break; case RIGHT_SHIFT: - ralloc_asprintf_rewrite_tail (out, len, ">>"); + ralloc_sprint_rewrite_tail(out, len, ">>", 2); break; case LESS_OR_EQUAL: - ralloc_asprintf_rewrite_tail (out, len, "<="); + ralloc_sprint_rewrite_tail(out, len, "<=", 2); break; case GREATER_OR_EQUAL: - ralloc_asprintf_rewrite_tail (out, len, ">="); + ralloc_sprint_rewrite_tail(out, len, ">=", 2); break; case EQUAL: - ralloc_asprintf_rewrite_tail (out, len, "=="); + ralloc_sprint_rewrite_tail(out, len, "==", 2);