Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Max Nikulin writes: > It is up to you to choose at which level your prefer to optimize the > code. And it is only my opinion (I do not insist) that benefits from > changes in low level code might be much more significant. I like the > idea of markers, but their current implementation is a source of pain. > >> (note that Nicolas did not use >> markers to store boundaries of org elements). > > E.g. export-related code certainly does need markers. You experienced > enough problems with attempts to properly invalidate cache when lower > level is not supposed to provide appropriate facilities. I understand your argument. However, I feel discouraged to contribute to Emacs devel because, most of Org users will not benefit from such contribution for a long time. Not until next several major versions of Emacs will be released. So, I currently prefer to contribute some backwards-compatible high-level code and leave Emacs core for future. Best, Ihor
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
On 02/03/2022 22:12, Ihor Radchenko wrote: Max Nikulin writes: I tend to agree after reading the code again. I tried to play around with that marker loop. It seems that the loop should not be mindlessly disabled, but it can be sufficient to check only a small number of markers in front of the marker list. The cached temporary markers are always added in front of the list. I did not try to say that the loop over markers may be just thrown away. By the way, for sequential scan (with no backward searches) single marker might work reasonably well. Some kind of index for fast mapping between bytes and positions should be maintained at the buffer level. I hope, when properly designed, such structure may minimize amount of recalculations on each edit. I mean some hierarchical structure of buffer fragments and markers keeps relative offsets from beginning of the fragment they belong to. Hierarchy of fragments is enough to provide initial estimation of position for byte index. Only markers within the fragment that is changed need immediate update. I am currently using a custom version of org-ql utilising the new element cache. It is substantially faster compared to current org-refile-get-targets. The org-ql version runs in <2 seconds at worst when calculating all refile targets from scratch, while org-refile-get-targets is over 10sec. org-ql version gives 0 noticeable latency when there is an extra text query to narrow down the refile targets. So, is it certainly possible to improve the performance just using high-level org-element cache API + regexp search without markers. It is up to you to choose at which level your prefer to optimize the code. And it is only my opinion (I do not insist) that benefits from changes in low level code might be much more significant. I like the idea of markers, but their current implementation is a source of pain. (note that Nicolas did not use markers to store boundaries of org elements). E.g. export-related code certainly does need markers. You experienced enough problems with attempts to properly invalidate cache when lower level is not supposed to provide appropriate facilities.
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Max Nikulin writes: > On 27/02/2022 13:43, Ihor Radchenko wrote: >> >> Now, I did an extended profiling of what is happening using perf: >> >> 6.20% [.] buf_bytepos_to_charpos > > Maybe I am interpreting such results wrongly, but it does not look like > a bottleneck. Anyway thank you very much for such efforts, however it is > unlikely that I will join to profiling in near future. The perf data I provided is a bit tricky. I recorded statistics over the whole Emacs session + used fairly small number of iterations in your benchmark code. Now, I repeated the testing plugging perf to Emacs only during the benchmark execution: With refile cache and markers: 22.82% emacs-29.0.50.1 emacs-29.0.50.1 [.] buf_bytepos_to_charpos 16.68% emacs-29.0.50.1 emacs-29.0.50.1 [.] rpl_re_search_2 8.02% emacs-29.0.50.1 emacs-29.0.50.1 [.] re_match_2_internal 6.93% emacs-29.0.50.1 emacs-29.0.50.1 [.] Fmemq 4.05% emacs-29.0.50.1 emacs-29.0.50.1 [.] allocate_vectorlike 1.88% emacs-29.0.50.1 emacs-29.0.50.1 [.] mark_object Without refile cache: 17.25% emacs-29.0.50.1 emacs-29.0.50.1 [.] rpl_re_search_2 15.84% emacs-29.0.50.1 emacs-29.0.50.1 [.] buf_bytepos_to_charpos 8.89% emacs-29.0.50.1 emacs-29.0.50.1 [.] re_match_2_internal 8.00% emacs-29.0.50.1 emacs-29.0.50.1 [.] Fmemq 4.35% emacs-29.0.50.1 emacs-29.0.50.1 [.] allocate_vectorlike 2.01% emacs-29.0.50.1 emacs-29.0.50.1 [.] mark_object Percents should be adjusted for larger execution time in the first dataset, but otherwise it is clear that buf_bytepos_to_charpos dominates the time delta. >> I am not sure if I understand the code correctly, but that loop is >> clearly scaling performance with the number of markers > > I may be terribly wrong, but it looks like an optimization attempt that > may actually ruin performance. My guess is the following. Due to > multibyte characters position in buffer counted in characters may > significantly differ from index in byte sequence. Since markers have > both values bytepos and charpos, they are used (when available) to > narrow down initial estimation interval [0, buffer size) to nearest > existing markers. The code below even creates temporary markers to make > next call of the function faster. I tend to agree after reading the code again. I tried to play around with that marker loop. It seems that the loop should not be mindlessly disabled, but it can be sufficient to check only a small number of markers in front of the marker list. The cached temporary markers are always added in front of the list. Limiting the number of checked markers to 10, I got the following result: With threshold and refile cache: | 9.5.2 || || | nm-tst | 28.060029337 | 4 | 1.842760862996 | | org-refile-get-targets | 3.244561543997 | 0 |0.0 | | nm-tst | 33.64825913704 | 4 | 1.230431054003 | | org-refile-cache-clear |0.034879062 | 0 |0.0 | | nm-tst | 23.974124596 | 5 | 1.429148814996 | Markers add +~5.6sec. Original Emacs code and refile cache: | 9.5.2 | | || | nm-tst | 29.494383528 | 4 | 3.036850853002 | | org-refile-get-targets | 3.635947646 | 1 | 0.454247973002 | | nm-tst | 36.537926593 | 4 | 1.129757634998 | | org-refile-cache-clear | 0.0096653649 | 0 |0.0 | | nm-tst | 23.283457105 | 4 | 1.053649649997 | Markers add +7sec. The improvement is there, though markers still somehow come into play. I speculate that limiting the number of checked markers might also force adding extra temporary markers to the list, but I haven't looked into that possibility for now. It might be better to discuss with emacs-devel before trying too hard. >> Finally, FYI. I plan to work on an alternative mechanism to access Org >> headings - generic Org query library. It will not use markers and >> implement ideas from org-ql. org-refile will eventually use that generic >> library instead of current mechanism. > > I suppose that markers might be implemented in an efficient way, and > much better performance may be achieved when low-level data structures > are accessible. I am in doubts concerning attempts to create something > that resembles markers but based purely on high-level API. I am currently using a custom version of org-ql utilising the new element cache. It is substantially faster compared to current org-refile-get-targets. The
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
On 27/02/2022 13:43, Ihor Radchenko wrote: Now, I did an extended profiling of what is happening using perf: 6.20% [.] buf_bytepos_to_charpos Maybe I am interpreting such results wrongly, but it does not look like a bottleneck. Anyway thank you very much for such efforts, however it is unlikely that I will join to profiling in near future. buf_bytepos_to_charpos contains the following loop: for (tail = BUF_MARKERS (b); tail; tail = tail->next) { CONSIDER (tail->bytepos, tail->charpos); /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ if (best_above - bytepos < distance || bytepos - best_below < distance) break; else distance += BYTECHAR_DISTANCE_INCREMENT; } I am not sure if I understand the code correctly, but that loop is clearly scaling performance with the number of markers I may be terribly wrong, but it looks like an optimization attempt that may actually ruin performance. My guess is the following. Due to multibyte characters position in buffer counted in characters may significantly differ from index in byte sequence. Since markers have both values bytepos and charpos, they are used (when available) to narrow down initial estimation interval [0, buffer size) to nearest existing markers. The code below even creates temporary markers to make next call of the function faster. It seems, buffers do not have any additional structures that track size in bytes and in characters of spans (I would not expect that representation of whole buffer in memory is single contiguous byte array). When there are no markers at all, the function has to iterate over each character and count its length. The problem is that when the buffer has a lot of markers far aside from the position passed as argument, then iteration over markers just consumes CPU with no significant improvement of original estimation of boundaries. If markers were organized in a tree than search would be much faster (at least for buffers with a lot of markers. In some cases such function may take a hint: previous known bytepos+charpos pair. I hope I missed something, but what I can expect from the code of buf_bytepos_to_charpos is that it is necessary to iterate over all markers to update positions after each typed character. Finally, FYI. I plan to work on an alternative mechanism to access Org headings - generic Org query library. It will not use markers and implement ideas from org-ql. org-refile will eventually use that generic library instead of current mechanism. I suppose that markers might be implemented in an efficient way, and much better performance may be achieved when low-level data structures are accessible. I am in doubts concerning attempts to create something that resembles markers but based purely on high-level API.
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Max Nikulin writes: >> Max Nikulin writes: >>> Actually I suspect that markers may have a similar problem during regexp >>> searches. I am curious if it is possible to invoke a kind of "vacuum" >>> (in SQL parlance). Folding all headings and resetting refile cache does >>> not restore performance to the initial state at session startup. Maybe >>> it is effect of incremental searches. >> >> I doubt that markers have anything to do with regexp search itself >> (directly). They should only come into play when editing text in buffer, >> where their performance is also O(N_markers). > > I believed, your confirmed my conclusion earlier: > > Ihor Radchenko. Re: [BUG] org-goto slows down org-set-property. > Sun, 11 Jul 2021 19:49:08 +0800. > https://list.orgmode.org/orgmode/87lf6dul3f.fsf@localhost/ I confirmed that invoking org-refile-get-targets slows down your nm-tst looping over the headlines. However, the issue is not with outline-next-heading there. Profiling shows that the slowdown mostly happens in org-get-property-block I have looked into regexp search C source and I did not find anything that could depend on the number markers in buffer. After further analysis now (after your email), I found that I may be wrong and regexp search might actually be affected. Now, I did an extended profiling of what is happening using perf: ;; perf cpu with refile cache (using your previous code on my largest Org buffer) 19.68% [.] mark_object 6.20% [.] buf_bytepos_to_charpos 5.66% [.] re_match_2_internal 5.33% [.] exec_byte_code 5.07% [.] rpl_re_search_2 3.09% [.] Fmemq 2.56% [.] allocate_vectorlike 1.86% [.] sweep_vectors 1.47% [.] mark_objects 1.45% [.] pdumper_marked_p_impl ;; perf cpu without refile cache (removing getting refile targets from the code) 18.79% [.] mark_object 8.23% [.] re_match_2_internal 5.88% [.] rpl_re_search_2 4.06% [.] buf_bytepos_to_charpos 3.06% [.] Fmemq 2.45% [.] allocate_vectorlike 1.63% [.] exec_byte_code 1.50% [.] pdumper_marked_p_impl The bottleneck appears to be buf_bytepos_to_charpos, called by BYTE_TO_CHAR macro, which, in turn, is used by set_search_regs buf_bytepos_to_charpos contains the following loop: for (tail = BUF_MARKERS (b); tail; tail = tail->next) { CONSIDER (tail->bytepos, tail->charpos); /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ if (best_above - bytepos < distance || bytepos - best_below < distance) break; else distance += BYTECHAR_DISTANCE_INCREMENT; } I am not sure if I understand the code correctly, but that loop is clearly scaling performance with the number of markers Finally, FYI. I plan to work on an alternative mechanism to access Org headings - generic Org query library. It will not use markers and implement ideas from org-ql. org-refile will eventually use that generic library instead of current mechanism. Best, Ihor
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Open up XMPP group for Org mode, that Jabber chat is lightweight and accessible through Emacs jabber.el and plethora of other applications. Don't forget to include Org links to XMPP groups. On February 22, 2022 5:33:13 AM UTC, Ihor Radchenko wrote: >Samuel Wales writes: > >> i have been dealing with latency also, often in undo-tree. this >might >> be a dumb suggestion, but is it related to org file size? my files >> have not really grown /that/ much but maybe you could bisect one. as >> opposed to config. > >I am wondering if many people in the list experience latency issues. >Maybe we can organise an online meeting (jitsi or BBB) and collect the >common causes/ do online interactive debugging? > >Best, >Ihor Jean
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
On 26/02/2022 14:45, Ihor Radchenko wrote: I think we have a misunderstanding here. That page does not contain much of technical details. Rather a history. Thank you for clarification. Certainly originally I had a hope to get some explanation why it was not implemented in a more efficient way. At first I read starting part of the text. It was still interesting to read the story that due to delay of Emacs release people had to fork it into Lucid. I did not know that XEmacs was a successor of Lucid. Max Nikulin writes: Actually I suspect that markers may have a similar problem during regexp searches. I am curious if it is possible to invoke a kind of "vacuum" (in SQL parlance). Folding all headings and resetting refile cache does not restore performance to the initial state at session startup. Maybe it is effect of incremental searches. I doubt that markers have anything to do with regexp search itself (directly). They should only come into play when editing text in buffer, where their performance is also O(N_markers). I believed, your confirmed my conclusion earlier: Ihor Radchenko. Re: [BUG] org-goto slows down org-set-property. Sun, 11 Jul 2021 19:49:08 +0800. https://list.orgmode.org/orgmode/87lf6dul3f.fsf@localhost/
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Max Nikulin writes: > Thank you, Ihor. I am still not motivated enough to read whole page but > searching for "interval" (earlier I tried "overlay") resulted in the > following message: > > Message-ID: <9206230917.aa16...@mole.gnu.ai.mit.edu> > Date: Tue, 23 Jun 92 05:17:33 -0400 > From: r...@gnu.ai.mit.edu (Richard Stallman) > > describing tree balancing problem in GNU Emacs and linear search in lucid. > > Unfortunately there is no "id" or "name" anchors in the file suitable to > specify precise location. Even the link href is broken. I think we have a misunderstanding here. That page does not contain much of technical details. Rather a history. AFAIU, initially Emacs wanted to implement balanced tree structure to store overlays, but the effort stalled for a long time. Then, a company rolled out a simple list storage causing a lot of contradiction related to FSF and a mojor Emacs fork. At the end, the initial effort using balanced tree on GNU Emacs side did not go anywhere and GNU Emacs eventually copied a simple list approach that is backfiring now, when Org buffers actually do contain a large numbers of overlays. > Actually I suspect that markers may have a similar problem during regexp > searches. I am curious if it is possible to invoke a kind of "vacuum" > (in SQL parlance). Folding all headings and resetting refile cache does > not restore performance to the initial state at session startup. Maybe > it is effect of incremental searches. I doubt that markers have anything to do with regexp search itself (directly). They should only come into play when editing text in buffer, where their performance is also O(N_markers). Best, Ihor
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
On 23/02/2022 23:35, Ihor Radchenko wrote: Max Nikulin writes: +;; the same purpose. Overlays are implemented with O(n) complexity in +;; Emacs (as for 2021-03-11). It means that any attempt to move +;; through hidden text in a file with many invisible overlays will +;; require time scaling with the number of folded regions (the problem +;; Overlays note of the manual warns about). For curious, historical +;; reasons why overlays are not efficient can be found in +;; https://www.jwz.org/doc/lemacs.html. The linked document consists of a lot of messages. Could you, please, provide more specific location within the rather long page? There is no specific location. That thread is an old drama unfolded when intervals were first implemented by a third-party company (they were called intervals that time). AFAIU, the fact that intervals are stored in a list and suffer from O(N) complexity originates from that time. Just history, as I pointed in the comment. Thank you, Ihor. I am still not motivated enough to read whole page but searching for "interval" (earlier I tried "overlay") resulted in the following message: Message-ID: <9206230917.aa16...@mole.gnu.ai.mit.edu> Date: Tue, 23 Jun 92 05:17:33 -0400 From: r...@gnu.ai.mit.edu (Richard Stallman) describing tree balancing problem in GNU Emacs and linear search in lucid. Unfortunately there is no "id" or "name" anchors in the file suitable to specify precise location. Even the link href is broken. Actually I suspect that markers may have a similar problem during regexp searches. I am curious if it is possible to invoke a kind of "vacuum" (in SQL parlance). Folding all headings and resetting refile cache does not restore performance to the initial state at session startup. Maybe it is effect of incremental searches. Sorry, I have not tried patches for text properties instead of overlays.
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Max Nikulin writes: >> +;; the same purpose. Overlays are implemented with O(n) complexity in >> +;; Emacs (as for 2021-03-11). It means that any attempt to move >> +;; through hidden text in a file with many invisible overlays will >> +;; require time scaling with the number of folded regions (the problem >> +;; Overlays note of the manual warns about). For curious, historical >> +;; reasons why overlays are not efficient can be found in >> +;; https://www.jwz.org/doc/lemacs.html. > > The linked document consists of a lot of messages. Could you, please, > provide more specific location within the rather long page? There is no specific location. That thread is an old drama unfolded when intervals were first implemented by a third-party company (they were called intervals that time). AFAIU, the fact that intervals are stored in a list and suffer from O(N) complexity originates from that time. Just history, as I pointed in the comment. FYI, a more optimal overlay data structure implementation has been attempted in feature/noverlay branch (for example, see https://git.savannah.gnu.org/cgit/emacs.git/commit/?h=feature/noverlay=8d7bdfa3fca076b34aaf86548d3243bee11872ad). But there is no activity on that branch for years. Best, Ihor
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
On 22/02/2022 12:33, Ihor Radchenko wrote: I am wondering if many people in the list experience latency issues. Ihor, it is unlikely the feedback that you would like to get concerning the following patch: Ihor Radchenko. [PATCH 01/35] Add org-fold-core: new folding engine. Sat, 29 Jan 2022 19:37:53 +0800. https://list.orgmode.org/74cd7fc06a4540b1d63d1e7f9f2542f83e1eaaae.1643454545.git.yanta...@gmail.com but my question may be more appropriate in this thread. I noticed the following: +;; the same purpose. Overlays are implemented with O(n) complexity in +;; Emacs (as for 2021-03-11). It means that any attempt to move +;; through hidden text in a file with many invisible overlays will +;; require time scaling with the number of folded regions (the problem +;; Overlays note of the manual warns about). For curious, historical +;; reasons why overlays are not efficient can be found in +;; https://www.jwz.org/doc/lemacs.html. The linked document consists of a lot of messages. Could you, please, provide more specific location within the rather long page?
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Matt Price writes: >> Note that org-context is an obsolete function. Do you directly call it >> in your config? Or do you use a third-party package calling org-context? >> > > Hmm. I don't see it anywhere in my ~.emacs.d/elpa~ directory or in my > config file. I also went through ORG-NEWS and while it mentions that > org-context-p has been removed, I can't find a deprecation notice about > org-context. I'm not quite sure what's going on. Will investigate further! That notice itself is WIP :facepalm: Basically, org-context is not reliable because is relies on fontification. See https://orgmode.org/list/877depxyo9.fsf@localhost Best, Ihor
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
On Wed, Feb 23, 2022 at 12:22 AM Ihor Radchenko wrote: > Matt Price writes: > > >>20128 80% - redisplay_internal (C function) > >> 7142 28% - assq > >> 908 3% - org-context > > Note that org-context is an obsolete function. Do you directly call it > in your config? Or do you use a third-party package calling org-context? > Hmm. I don't see it anywhere in my ~.emacs.d/elpa~ directory or in my config file. I also went through ORG-NEWS and while it mentions that org-context-p has been removed, I can't find a deprecation notice about org-context. I'm not quite sure what's going on. Will investigate further! > > Best, > Ihor >
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Matt Price writes: > Yes, it definitely seems to be related tofile size, which makes me think > that some kind of buffer parsing is the cause of the problem. Parsing would show up in the profiler report in such scenario. It is not the case though. The problem might be invisible text (it would cause redisplay become slow), but 15k lines is relatively small - it should not cause redisplay issues according to my experience. Just to be sure, I would try to check performance in a completely unfolded buffer. Best, Ihor
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Matt Price writes: >>20128 80% - redisplay_internal (C function) >> 7142 28% - assq >> 908 3% - org-context Note that org-context is an obsolete function. Do you directly call it in your config? Or do you use a third-party package calling org-context? Best, Ihor
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
sorry everyone, I accidentally sent this to Kaushal this morning, and then took quite a while to get back to a computer after he let me know my mistake! On Tue, Feb 22, 2022 at 10:12 AM Matt Price wrote: > > On Tue, Feb 22, 2022 at 12:45 AM Kaushal Modi > wrote: > >> >> >> On Tue, Feb 22, 2022, 12:34 AM Ihor Radchenko wrote: >> >>> >>> I am wondering if many people in the list experience latency issues. >>> Maybe we can organise an online meeting (jitsi or BBB) and collect the >>> common causes/ do online interactive debugging? >>> >> >> +1 >> >> I have seen few people see this issue on the ox-hugo issue tracker: >> https://github.com/kaushalmodi/ox-hugo/discussions/551#discussioncomment-2104352 >> > > > I htink it's a great idea, Ihor! > > Meanwhile, I have a profile report. I had a little trouble getting the > slowness to return (of course) but, subjectively, it seemed to get worse > (subjectively slower, and the laptop fan started up b/c of high cpu usage) > when I created and entered a src block. Apologies for the long paste: > >45707 70% - redisplay_internal (C function) > 8468 13% - substitute-command-keys > 6111 9% - # > 943 1%- kill-buffer > 708 1% - replace-buffer-in-windows > 614 0% - unrecord-window-buffer > 515 0% - assq-delete-all > 142 0% assoc-delete-all >3 0% delete-char > 8060 12% - assq > 2598 4% - org-context > 15 0% org-inside-LaTeX-fragment-p > 12 0%- org-in-src-block-p > 12 0% - org-element-at-point >9 0% - org-element--cache-verify-element >9 0% org-element--parse-to >3 0%org-element--parse-to >8 0%- org-at-timestamp-p >8 0% org-in-regexp > 642 0% + tab-bar-make-keymap > 309 0% + and > 270 0% + org-in-subtree-not-table-p > 196 0% + not > 163 0% + jit-lock-function > 115 0% + org-entry-get > 96 0%keymap-canonicalize > 56 0%org-at-table-p > 52 0% + # > 48 0% + # > 43 0%table--row-column-insertion-point-p > 29 0%org-inside-LaTeX-fragment-p > 27 0% + menu-bar-positive-p > 26 0% + eval > 24 0%file-readable-p > 21 0% + funcall > 16 0% + imenu-update-menubar > 14 0% + vc-menu-map-filter > 13 0% + table--probe-cell > 12 0% + or > 11 0% + let > 11 0% + org-at-timestamp-p > 10 0% + flycheck-overlays-at >7 0%undo-tree-update-menu-bar >6 0% + require >6 0% + > emojify-update-visible-emojis-background-after-window-scroll >6 0%kill-this-buffer-enabled-p >4 0%mode-line-default-help-echo >3 0% + null > 9192 14% - ... > 9172 14%Automatic GC > 20 0% - kill-visual-line > 20 0% - kill-region > 20 0%- filter-buffer-substring > 20 0% - org-fold-core--buffer-substring-filter > 20 0% - buffer-substring--filter > 20 0% - # > 20 0%- apply > 20 0% - # F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_18> > 20 0% - # > 20 0% - apply > 20 0%- # > 20 0% - # > 20 0% - # > 20 0% - apply > 20 0%- # > 20 0% + delete-and-extract-region > 7847 12% - command-execute > 5749 8% - funcall-interactively > 2963 4% + org-self-insert-command > 2186 3% + org-cycle > 148 0% + corfu-insert > 146 0% + execute-extended-command > 121 0% + org-return > 32 0% + # > 26 0% + # > 24 0% + mwim-beginning > 19 0% + org-delete-backward-char > 19 0% + org-kill-line >9 0% + # >6 0% + file-notify-handle-event > 2095 3% + byte-code > 1359 2% + timer-event-handler > 375 0% + org-appear--post-cmd > 160 0% + corfu--post-command > 61 0% + org-fragtog--post-cmd > 14 0% + emojify-update-visible-emojis-background-after-command > 11 0% guide-key/close-guide-buffer >7 0% + flycheck-perform-deferred-syntax-check >7 0% + flycheck-maybe-display-error-at-point-soon >6 0% undo-auto--add-boundary >6 0% + corfu--auto-post-command >4 0% flycheck-error-list-update-source >
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Yes, it definitely seems to be related tofile size, which makes me think that some kind of buffer parsing is the cause of the problem. I'll replay in more detail to Ihor, down below! On Mon, Feb 21, 2022 at 5:22 PM Samuel Wales wrote: > i have been dealing with latency also, often in undo-tree. this might > be a dumb suggestion, but is it related to org file size? my files > have not really grown /that/ much but maybe you could bisect one. as > opposed to config. > > i am not saying that your org files are too big. just that maybe it > could lead to insights. > > > On 2/21/22, Matt Price wrote: > > I'm trying to figure out what causes high latency while typing in large > > org-mode files. The issue is very clearly a result of my large config > > file, but I'm not sure how to track it down with any precision. > > > > My main literate config file is ~/.emacs.d/emacs-init.org, currently > 15000 > > lines, 260 src blocks. > > If I create a ~minimal.el~ config like this: > > > > (let* ((all-paths > > '("/home/matt/src/org-mode/emacs/site-lisp/org"))) > > (dolist (p all-paths) > > (add-to-list 'load-path p))) > > > > (require 'org) > > (find-file "~/.emacs.d/emacs-init.org") > > > > then I do not notice any latency while typing. If I run the profiler > while > > using the minimal config, the profile looks about like this at a high > > level: > > > > 1397 71% - command-execute > > 740 37% - funcall-interactively > > 718 36% - org-self-insert-command > > 686 34%+ org-element--cache-after-change > > 10 0%+ org-fold-core--fix-folded-region > >3 0%+ blink-paren-post-self-insert-function > >2 0%+ jit-lock-after-change > >1 0% > > org-fold-check-before-invisible-edit--text-properties > >9 0% + previous-line > >6 0% + minibuffer-complete > >3 0% + org-return > >3 0% + execute-extended-command > > 657 33% - byte-code > > 657 33% - read-extended-command > > 64 3%- completing-read-default > > 14 0% + redisplay_internal (C function) > >1 0% + timer-event-handler > > 371 18% - redisplay_internal (C function) > > 251 12% + jit-lock-function > > 90 4% + assq > >7 0% + substitute-command-keys > >3 0% + eval > > 125 6% + timer-event-handler > > 69 3% + ... > > > > -- > > However, if I instead use my fairly extensive main config, latency is > high > > enough that there's a noticeable delay while typing ordinary words. I see > > this regardless of whether I build from main or from Ihor's org-fold > > feature branch on github. The profiler overview here is pretty different > -- > > redisplay_internal takes a much higher percentage of the CPU requirement: > > > > 3170 56% - redisplay_internal (C function) > > 693 12% - substitute-command-keys > > 417 7% + # > > 59 1% + assq > > 49 0% + org-in-subtree-not-table-p > > 36 0% + tab-bar-make-keymap > > 35 0%and > > 24 0% + not > > 16 0%org-at-table-p > > 13 0% + jit-lock-function > >8 0%keymap-canonicalize > >7 0% + # > >4 0% + funcall > >4 0%display-graphic-p > >3 0% + # > >3 0%file-readable-p > >3 0% + table--probe-cell > >3 0%table--row-column-insertion-point-p > > 1486 26% - command-execute > > 1200 21% - byte-code > > 1200 21% - read-extended-command > > 1200 21%- completing-read-default > > 1200 21% - apply > > 1200 21% - vertico--advice > > 475 8% + # > > > > -- > > I've almost never used the profiler and am not quite sure how I should > > proceed to debug this. I realize I can comment out parts of the config > one > > at a time, but that is not so easy for me to do in my current setup, and > I > > suppose there are likely to be multiple contributing causes, which I may > > not really notice except in the aggregate. > > > > If anyone has suggestions, I would love to hear them! > > > > Thanks, > > > > Matt > > > > > -- > The Kafka Pandemic > > A blog about science, health, human rights, and misopathy: > https://thekafkapandemic.blogspot.com >
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Ihor Radchenko writes: > Samuel Wales writes: > >> i have been dealing with latency also, often in undo-tree. this might >> be a dumb suggestion, but is it related to org file size? my files >> have not really grown /that/ much but maybe you could bisect one. as >> opposed to config. > > I am wondering if many people in the list experience latency issues. FYI: I experience high latency when typing near in-text citations, such as [cite:@ganz+2013]. It got so bad that I converted all my files to hard-wrapped lines. After I did that, the Org mode became usable again, but it still lags visibly when typing near a citation. Rudy -- "'Contrariwise,' continued Tweedledee, 'if it was so, it might be; and if it were so, it would be; but as it isn't, it ain't. That's logic.'" -- Lewis Carroll, Through the Looking Glass, 1871/1872 Rudolf Adamkovič [he/him] Studenohorská 25 84103 Bratislava Slovakia
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
On Tue, Feb 22, 2022, 12:34 AM Ihor Radchenko wrote: > > I am wondering if many people in the list experience latency issues. > Maybe we can organise an online meeting (jitsi or BBB) and collect the > common causes/ do online interactive debugging? > +1 I have seen few people see this issue on the ox-hugo issue tracker: https://github.com/kaushalmodi/ox-hugo/discussions/551#discussioncomment-2104352
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Samuel Wales writes: > i have been dealing with latency also, often in undo-tree. this might > be a dumb suggestion, but is it related to org file size? my files > have not really grown /that/ much but maybe you could bisect one. as > opposed to config. I am wondering if many people in the list experience latency issues. Maybe we can organise an online meeting (jitsi or BBB) and collect the common causes/ do online interactive debugging? Best, Ihor
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
Matt Price writes: > However, if I instead use my fairly extensive main config, latency is high > enough that there's a noticeable delay while typing ordinary words. I see > this regardless of whether I build from main or from Ihor's org-fold > feature branch on github. The profiler overview here is pretty different -- > redisplay_internal takes a much higher percentage of the CPU requirement: > > 3170 56% - redisplay_internal (C function) > > 1200 21%- completing-read-default > 1200 21% - apply > 1200 21% - vertico--advice > 475 8% + # Judging from the profiler report, you did not collect enough number of CPU samples. I recommend to keep the profiler running for at least 10-30 seconds when trying to profile typing latency. Also, note that running M-x profiler-report second time will _not_ reproduce the previous report, but instead show CPU profiler report between the last invocation of profiler-report and the second one. I recommend to do the following: 1. M-x profiler-stop 2. M-x profiler-start 3. Do typing in the problematic Org file for 10-30 seconds 4. M-x profiler-report (once!) 5. Share the report here > I've almost never used the profiler and am not quite sure how I should > proceed to debug this. I realize I can comment out parts of the config one > at a time, but that is not so easy for me to do in my current setup, and I > suppose there are likely to be multiple contributing causes, which I may > not really notice except in the aggregate. The above steps should be the first thing to try and they will likely reveal the bottleneck. If not, you can go back to genetic bisection. I do not recommend manual commenting/uncommenting parts of you large config. Instead, you can try https://github.com/Malabarba/elisp-bug-hunter. But only if CPU profiling does not reveal anything useful. Best, Ihor
Re: profiling latency in large org-mode buffers (under both main & org-fold feature)
i have been dealing with latency also, often in undo-tree. this might be a dumb suggestion, but is it related to org file size? my files have not really grown /that/ much but maybe you could bisect one. as opposed to config. i am not saying that your org files are too big. just that maybe it could lead to insights. On 2/21/22, Matt Price wrote: > I'm trying to figure out what causes high latency while typing in large > org-mode files. The issue is very clearly a result of my large config > file, but I'm not sure how to track it down with any precision. > > My main literate config file is ~/.emacs.d/emacs-init.org, currently 15000 > lines, 260 src blocks. > If I create a ~minimal.el~ config like this: > > (let* ((all-paths > '("/home/matt/src/org-mode/emacs/site-lisp/org"))) > (dolist (p all-paths) > (add-to-list 'load-path p))) > > (require 'org) > (find-file "~/.emacs.d/emacs-init.org") > > then I do not notice any latency while typing. If I run the profiler while > using the minimal config, the profile looks about like this at a high > level: > > 1397 71% - command-execute > 740 37% - funcall-interactively > 718 36% - org-self-insert-command > 686 34%+ org-element--cache-after-change > 10 0%+ org-fold-core--fix-folded-region >3 0%+ blink-paren-post-self-insert-function >2 0%+ jit-lock-after-change >1 0% > org-fold-check-before-invisible-edit--text-properties >9 0% + previous-line >6 0% + minibuffer-complete >3 0% + org-return >3 0% + execute-extended-command > 657 33% - byte-code > 657 33% - read-extended-command > 64 3%- completing-read-default > 14 0% + redisplay_internal (C function) >1 0% + timer-event-handler > 371 18% - redisplay_internal (C function) > 251 12% + jit-lock-function > 90 4% + assq >7 0% + substitute-command-keys >3 0% + eval > 125 6% + timer-event-handler > 69 3% + ... > > -- > However, if I instead use my fairly extensive main config, latency is high > enough that there's a noticeable delay while typing ordinary words. I see > this regardless of whether I build from main or from Ihor's org-fold > feature branch on github. The profiler overview here is pretty different -- > redisplay_internal takes a much higher percentage of the CPU requirement: > > 3170 56% - redisplay_internal (C function) > 693 12% - substitute-command-keys > 417 7% + # > 59 1% + assq > 49 0% + org-in-subtree-not-table-p > 36 0% + tab-bar-make-keymap > 35 0%and > 24 0% + not > 16 0%org-at-table-p > 13 0% + jit-lock-function >8 0%keymap-canonicalize >7 0% + # >4 0% + funcall >4 0%display-graphic-p >3 0% + # >3 0%file-readable-p >3 0% + table--probe-cell >3 0%table--row-column-insertion-point-p > 1486 26% - command-execute > 1200 21% - byte-code > 1200 21% - read-extended-command > 1200 21%- completing-read-default > 1200 21% - apply > 1200 21% - vertico--advice > 475 8% + # > > -- > I've almost never used the profiler and am not quite sure how I should > proceed to debug this. I realize I can comment out parts of the config one > at a time, but that is not so easy for me to do in my current setup, and I > suppose there are likely to be multiple contributing causes, which I may > not really notice except in the aggregate. > > If anyone has suggestions, I would love to hear them! > > Thanks, > > Matt > -- The Kafka Pandemic A blog about science, health, human rights, and misopathy: https://thekafkapandemic.blogspot.com