Re: [Zim-wiki] caching and progressively slower page loads

2018-02-05 Thread Nathaniel Morck Beaver

Sounds good, I just opened an issue here:

https://github.com/jaap-karssenberg/zim-desktop-wiki/issues/311

Best,

Nathaniel

On 02/02/2018 06:25 AM, Stephen LARROQUE wrote:
Thank you very much Nathaniel for tracking the culprit, someone else 
reported a similar issue on the mailing list a few months ago but could 
not find the root.


I think you should open an issue on the issue tracker, else this will 
get lost in the archives of the mailing list.

Best regards,
Stephen L.


Le Jeu. 1 Fév. 2018 22:56, Nathaniel Beaver > a écrit :


When Zim starts up, switching pages is nice and snappy.

https://www.youtube.com/watch?v=tl4_t6dW1l0

After a while, loading even the smallest page has a noticeable delay.

https://www.youtube.com/watch?v=euyGgEwLaIk

Note the time required to display the new page in addition to the time
before the cursor becomes active.

This is running a recent Github version (commit
e46a4b40a90d6903b6fa860bd72a4464dab7d704) with all add-ons disabled,
including built-in add-ons. However, it was originally reproduced on
older versions, so it is not a new effect.

A few backtraces with GDB during a page switch shows most of the time is
spent in this query from the "get_mytreeiter()" function in
zim/notebook/index/pages.py:

      for i, row in enumerate(self.db.execute('''
          SELECT * FROM pages WHERE parent=?
          ORDER BY sortkey, name LIMIT 20 OFFSET ?
          ''',
          (parent_id, offset)
      )):


https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/e46a4b40a90d6903b6fa860bd72a4464dab7d704/zim/notebook/index/pages.py#L804

This is being called from the main loop.

      Traceback (most recent call first):
        File

"/home/nathaniel/.local/lib/python2.7/site-packages/zim/notebook/index/pages.py",
line 805, in get_mytreeiter
          (parent_id, offset)
        File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py",
line 224, in on_get_iter
          return self.get_mytreeiter(treepath)
        File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py",
line 238, in on_iter_next
          return self.on_get_iter(treepath)
        File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
line 703, in _run_main_loop
          gtk.main()
        File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
line 656, in _run_cmd
          self._run_main_loop(cmd)
        File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
line 626, in run
          self._run_cmd(cmd, args) # test seam
        File
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
line 845, in main

The query itself is fast, so it seems strange that so much time is spent
here. However, running Zim with "cprofile" for about a day shows there
are roughly 1.3e6 calls to the sqlite3 execute method, so even though
the time per call is less than a millisecond, the cumulative time is
considerable.

      Wed Jan 17 10:33:24 2018   
zim.notebook.index.pages_1516168791.prof


               43403639 function calls (42725089 primitive calls) in
86784.644 seconds

         Ordered by: internal time
         List reduced from 2168 to 4 due to restriction <4>

         ncalls  tottime  percall  cumtime  percall
filename:lineno(function)
              1 85453.595 85453.595 86765.186 86765.186 {gtk._gtk.main}
        1347825 1154.685    0.001 1154.686    0.001 {method 'execute' of
'sqlite3.Connection' objects}
        5138231   50.881    0.000 1207.454    0.000

/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py:231(on_iter_next)
             11   43.953    3.996   48.992    4.454

/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/widgets.py:3152(run)

By setting a breakpoint in GDB:

      break _pysqlite_query_execute

and then using "ignore" and "info break", we can count how many queries
are run. In my notebook of about 1,400 pages, there are queries
amounting to roughly twice that on startup. When the pages begin loading
slowly, it runs correspondingly more queries.

Since "get_mytreeiter()" caches the page on each call, is it possible
that the cache is being duplicated or otherwise not working properly,
resulting in an excessive number of queries?

If the cache is working as intended, could it be changed so that it does
not become progressively slower?

Should I file a bug report on Github, or should I wait until there is
something more concrete to submit?


Nathaniel Beaver

P.S. Upon request, I am happy to provide full data and details on

Re: [Zim-wiki] caching and progressively slower page loads

2018-02-02 Thread Stephen LARROQUE
Thank you very much Nathaniel for tracking the culprit, someone else
reported a similar issue on the mailing list a few months ago but could not
find the root.

I think you should open an issue on the issue tracker, else this will get
lost in the archives of the mailing list.
Best regards,
Stephen L.

Le Jeu. 1 Fév. 2018 22:56, Nathaniel Beaver  a
écrit :

> When Zim starts up, switching pages is nice and snappy.
>
> https://www.youtube.com/watch?v=tl4_t6dW1l0
>
> After a while, loading even the smallest page has a noticeable delay.
>
> https://www.youtube.com/watch?v=euyGgEwLaIk
>
> Note the time required to display the new page in addition to the time
> before the cursor becomes active.
>
> This is running a recent Github version (commit
> e46a4b40a90d6903b6fa860bd72a4464dab7d704) with all add-ons disabled,
> including built-in add-ons. However, it was originally reproduced on
> older versions, so it is not a new effect.
>
> A few backtraces with GDB during a page switch shows most of the time is
> spent in this query from the "get_mytreeiter()" function in
> zim/notebook/index/pages.py:
>
>  for i, row in enumerate(self.db.execute('''
>  SELECT * FROM pages WHERE parent=?
>  ORDER BY sortkey, name LIMIT 20 OFFSET ?
>  ''',
>  (parent_id, offset)
>  )):
>
>
> https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/e46a4b40a90d6903b6fa860bd72a4464dab7d704/zim/notebook/index/pages.py#L804
>
> This is being called from the main loop.
>
>  Traceback (most recent call first):
>File
>
> "/home/nathaniel/.local/lib/python2.7/site-packages/zim/notebook/index/pages.py",
> line 805, in get_mytreeiter
>  (parent_id, offset)
>File
> "/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py",
> line 224, in on_get_iter
>  return self.get_mytreeiter(treepath)
>File
> "/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py",
> line 238, in on_iter_next
>  return self.on_get_iter(treepath)
>File
> "/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
> line 703, in _run_main_loop
>  gtk.main()
>File
> "/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
> line 656, in _run_cmd
>  self._run_main_loop(cmd)
>File
> "/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
> line 626, in run
>  self._run_cmd(cmd, args) # test seam
>File
> "/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py",
> line 845, in main
>
> The query itself is fast, so it seems strange that so much time is spent
> here. However, running Zim with "cprofile" for about a day shows there
> are roughly 1.3e6 calls to the sqlite3 execute method, so even though
> the time per call is less than a millisecond, the cumulative time is
> considerable.
>
>  Wed Jan 17 10:33:24 2018zim.notebook.index.pages_1516168791.prof
>
>   43403639 function calls (42725089 primitive calls) in
> 86784.644 seconds
>
> Ordered by: internal time
> List reduced from 2168 to 4 due to restriction <4>
>
> ncalls  tottime  percall  cumtime  percall
> filename:lineno(function)
>  1 85453.595 85453.595 86765.186 86765.186 {gtk._gtk.main}
>1347825 1154.6850.001 1154.6860.001 {method 'execute' of
> 'sqlite3.Connection' objects}
>5138231   50.8810.000 1207.4540.000
>
> /home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py:231(on_iter_next)
> 11   43.9533.996   48.9924.454
>
> /home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/widgets.py:3152(run)
>
> By setting a breakpoint in GDB:
>
>  break _pysqlite_query_execute
>
> and then using "ignore" and "info break", we can count how many queries
> are run. In my notebook of about 1,400 pages, there are queries
> amounting to roughly twice that on startup. When the pages begin loading
> slowly, it runs correspondingly more queries.
>
> Since "get_mytreeiter()" caches the page on each call, is it possible
> that the cache is being duplicated or otherwise not working properly,
> resulting in an excessive number of queries?
>
> If the cache is working as intended, could it be changed so that it does
> not become progressively slower?
>
> Should I file a bug report on Github, or should I wait until there is
> something more concrete to submit?
>
>
> Nathaniel Beaver
>
> P.S. Upon request, I am happy to provide full data and details on
> collection methods.
>
> Please let me know if there is something I overlooked or a flaw in the
> methodology.
>
> This is a follow-up to my message from December 29, 2017 with subject
> "profiling zim for performance?"
>
> ___
> Mailing list: https://launchpad.net/~zim-wiki
> Post to : zim-wiki@lists.launchpad.net
> Unsubscribe : 

[Zim-wiki] caching and progressively slower page loads

2018-02-01 Thread Nathaniel Beaver

When Zim starts up, switching pages is nice and snappy.

https://www.youtube.com/watch?v=tl4_t6dW1l0

After a while, loading even the smallest page has a noticeable delay.

https://www.youtube.com/watch?v=euyGgEwLaIk

Note the time required to display the new page in addition to the time 
before the cursor becomes active.


This is running a recent Github version (commit 
e46a4b40a90d6903b6fa860bd72a4464dab7d704) with all add-ons disabled, 
including built-in add-ons. However, it was originally reproduced on 
older versions, so it is not a new effect.


A few backtraces with GDB during a page switch shows most of the time is 
spent in this query from the "get_mytreeiter()" function in 
zim/notebook/index/pages.py:


for i, row in enumerate(self.db.execute('''
SELECT * FROM pages WHERE parent=?
ORDER BY sortkey, name LIMIT 20 OFFSET ?
''',
(parent_id, offset)
)):

https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/e46a4b40a90d6903b6fa860bd72a4464dab7d704/zim/notebook/index/pages.py#L804

This is being called from the main loop.

Traceback (most recent call first):
  File 
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/notebook/index/pages.py", 
line 805, in get_mytreeiter

(parent_id, offset)
  File 
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py", 
line 224, in on_get_iter

return self.get_mytreeiter(treepath)
  File 
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py", 
line 238, in on_iter_next

return self.on_get_iter(treepath)
  File 
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py", 
line 703, in _run_main_loop

gtk.main()
  File 
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py", 
line 656, in _run_cmd

self._run_main_loop(cmd)
  File 
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py", 
line 626, in run

self._run_cmd(cmd, args) # test seam
  File 
"/home/nathaniel/.local/lib/python2.7/site-packages/zim/main/__init__.py", 
line 845, in main


The query itself is fast, so it seems strange that so much time is spent 
here. However, running Zim with "cprofile" for about a day shows there 
are roughly 1.3e6 calls to the sqlite3 execute method, so even though 
the time per call is less than a millisecond, the cumulative time is 
considerable.


Wed Jan 17 10:33:24 2018zim.notebook.index.pages_1516168791.prof

 43403639 function calls (42725089 primitive calls) in 
86784.644 seconds


   Ordered by: internal time
   List reduced from 2168 to 4 due to restriction <4>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1 85453.595 85453.595 86765.186 86765.186 {gtk._gtk.main}
  1347825 1154.6850.001 1154.6860.001 {method 'execute' of 
'sqlite3.Connection' objects}
  5138231   50.8810.000 1207.4540.000 
/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/pageindex.py:231(on_iter_next)
   11   43.9533.996   48.9924.454 
/home/nathaniel/.local/lib/python2.7/site-packages/zim/gui/widgets.py:3152(run)


By setting a breakpoint in GDB:

break _pysqlite_query_execute

and then using "ignore" and "info break", we can count how many queries 
are run. In my notebook of about 1,400 pages, there are queries 
amounting to roughly twice that on startup. When the pages begin loading 
slowly, it runs correspondingly more queries.


Since "get_mytreeiter()" caches the page on each call, is it possible 
that the cache is being duplicated or otherwise not working properly, 
resulting in an excessive number of queries?


If the cache is working as intended, could it be changed so that it does 
not become progressively slower?


Should I file a bug report on Github, or should I wait until there is 
something more concrete to submit?



Nathaniel Beaver

P.S. Upon request, I am happy to provide full data and details on 
collection methods.


Please let me know if there is something I overlooked or a flaw in the 
methodology.


This is a follow-up to my message from December 29, 2017 with subject 
"profiling zim for performance?"


___
Mailing list: https://launchpad.net/~zim-wiki
Post to : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp