Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2
Thank you for the suggestion, and sorry for the delay. I did try removing each of those, but it looks like they may be part of the termination condition for building an empty index, and so with either or both edits, the notebooks just open with no index having being built. On Tue, 27 Mar 2018 08:37:06 + Jaap Karssenberg wrote: > You could try removing the "ORDER BY" and/or "path LIKE" pieces of the > query, see what that does for your performance. > > If this turns out to help, I can split the loop into several while > yielding in between to make the rest of the application "tick" > > -- Jaap > > > On Sun, Mar 25, 2018 at 11:14 PM wrote: > > > At brief glance, I'm guessing this is the issue: > > > > > > https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/d96b3509890f4c9b9af9119f64b64947337d8da7/zim/notebook/index/files.py > > line 89 > > > > def _update_iter_inner(self, prefix=''): > > # sort folders before files: first index structure, > > then contents # this makes e.g. index links more efficient > > and robust # sort by id to ensure parents are found before children > > while True: > > row = self.db.execute( > > 'SELECT id, path, node_type FROM > > files' ' WHERE index_status = ? AND path > > LIKE ?' ' ORDER BY node_type, id', > > (STATUS_NEED_UPDATE, prefix + '%') > > ).fetchone() > > > > if row: > > node_id, path, node_type = row > > #print ">> UPDATE", node_id, path, > > node_type > > else: > > break > > > > It seems like the whole database is being re-loaded and re-ordered > > again for the import of every single file. As file number in a > > notebook increases, this per-file database operation seems not to > > scale linearly, but some much higher order. Something like globbing > > for the entire notebook-subdirectory structure and then db > > importing on a loop through that glob would be vastly more > > efficient for large file numbers. > > > > > > > > On Sun, 25 Mar 2018 14:30:33 -0500 > > wrote: > > > > > Would you mind pointing me to the source file(s) that manage this > > > indexing? I'd like to see if there is any way to speed the process > > > up for large numbers of files. > > > > > > > > > > > > On Mon, 3 Jul 2017 17:42:07 + > > > wrote: > > > > > > > Yes, for medium sized notebooks, and those with a "normal" > > > > amount of files, indexing is still under 5 minutes. I also use > > > > Zim to manage a notebook under which there are lots of small > > > > work data files (>350,000). > > > > > > > > The progress bar suggests there is some part of the parsing > > > > process that slows down over time, as does a cursory check on > > > > the contents of the database updating over time. There are many > > > > more files added within the first few minutes, and many fewer > > > > over time, such that after a while, only one or two files are > > > > added ever several minutes. It suggests to me that the whole > > > > list is being re-processed or re-opened as part of the indexing > > > > loop, perhaps re-opening the sqlite file for every new file or > > > > something. Ultimately, I don't think that exponential slowdown > > > > is a necessity, but I have not had a free moment to familiarize > > > > myself with the source yet. > > > > > > > > Thanks! > > > > > > > > > > > > > > > > On Mon, 03 Jul 2017 08:04:36 + > > > > Jaap Karssenberg wrote: > > > > > > > > > Yes, zim does indeed now build a tabel of all files in the > > > > > notebook folder, not just text files. However it doesn't > > > > > access them, it just stores file names and mtime. > > > > > > > > > > Despite this change, the indexing is faster than with 0.65 in > > > > > most of my test cases. The behavior you describe suggest a > > > > > huge amount of files under the notebook folder, is this the > > > > > case? > > > > > > > > > > -- Jaap > > > > > > > > > > On Sun, Jul 2, 2017 at 8:43 PM wrote: > > > > > > > > > > > The notebooks that used to take me about 5 minutes to > > > > > > re-index are taking close to 40 hours for me (they are > > > > > > larger notebooks). > > > > > > > > > > > > It looks like the sql database is indexing every file under > > > > > > the root directory of the notebook, even those not > > > > > > associated with Zim directly, like zip or data files. I'm > > > > > > not sure if that was happening with earlier versions. > > > > > > > > > > > > > > > > > > > > > > > > On Sat, 1 Jul 2017 23:32:16 +0200 > > > > > > Olivier Boesch wrote: > > > > > > > > > > > > > 6 minutes to reindex. pretty long in comparison with the > > > > > > > 0.65. > > > > > > > > > > > > > > > > > > > > > Le 01/07/2017 à 23:24, Olivier Boesch a écrit : > > > > > > > >
Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2
At brief glance, I'm guessing this is the issue: https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/d96b3509890f4c9b9af9119f64b64947337d8da7/zim/notebook/index/files.py line 89 def _update_iter_inner(self, prefix=''): # sort folders before files: first index structure, then contents # this makes e.g. index links more efficient and robust # sort by id to ensure parents are found before children while True: row = self.db.execute( 'SELECT id, path, node_type FROM files' ' WHERE index_status = ? AND path LIKE ?' ' ORDER BY node_type, id', (STATUS_NEED_UPDATE, prefix + '%') ).fetchone() if row: node_id, path, node_type = row #print ">> UPDATE", node_id, path, node_type else: break It seems like the whole database is being re-loaded and re-ordered again for the import of every single file. As file number in a notebook increases, this per-file database operation seems not to scale linearly, but some much higher order. Something like globbing for the entire notebook-subdirectory structure and then db importing on a loop through that glob would be vastly more efficient for large file numbers. On Sun, 25 Mar 2018 14:30:33 -0500wrote: > Would you mind pointing me to the source file(s) that manage this > indexing? I'd like to see if there is any way to speed the process > up for large numbers of files. > > > > On Mon, 3 Jul 2017 17:42:07 + > wrote: > > > Yes, for medium sized notebooks, and those with a "normal" amount of > > files, indexing is still under 5 minutes. I also use Zim to manage a > > notebook under which there are lots of small work data files > > (>350,000). > > > > The progress bar suggests there is some part of the parsing process > > that slows down over time, as does a cursory check on the contents > > of the database updating over time. There are many more files added > > within the first few minutes, and many fewer over time, such that > > after a while, only one or two files are added ever several minutes. > > It suggests to me that the whole list is being re-processed or > > re-opened as part of the indexing loop, perhaps re-opening the > > sqlite file for every new file or something. Ultimately, I don't > > think that exponential slowdown is a necessity, but I have not had > > a free moment to familiarize myself with the source yet. > > > > Thanks! > > > > > > > > On Mon, 03 Jul 2017 08:04:36 + > > Jaap Karssenberg wrote: > > > > > Yes, zim does indeed now build a tabel of all files in the > > > notebook folder, not just text files. However it doesn't access > > > them, it just stores file names and mtime. > > > > > > Despite this change, the indexing is faster than with 0.65 in most > > > of my test cases. The behavior you describe suggest a huge amount > > > of files under the notebook folder, is this the case? > > > > > > -- Jaap > > > > > > On Sun, Jul 2, 2017 at 8:43 PM wrote: > > > > > > > The notebooks that used to take me about 5 minutes to re-index > > > > are taking close to 40 hours for me (they are larger notebooks). > > > > > > > > It looks like the sql database is indexing every file under the > > > > root directory of the notebook, even those not associated with > > > > Zim directly, like zip or data files. I'm not sure if that was > > > > happening with earlier versions. > > > > > > > > > > > > > > > > On Sat, 1 Jul 2017 23:32:16 +0200 > > > > Olivier Boesch wrote: > > > > > > > > > 6 minutes to reindex. pretty long in comparison with the 0.65. > > > > > > > > > > > > > > > Le 01/07/2017 à 23:24, Olivier Boesch a écrit : > > > > > > > > > > > > I seem to experience the same issue... > > > > > > > > > > > > I clicked the "cancel" button after several minutes... > > > > > > > > > > > > testing now how long it takes to re-index... > > > > > > > > > > > > > > > > > > Le 01/07/2017 à 23:04, hawk...@bitmessage.ch a écrit : > > > > > >> After this latest upgrade came through (it looks great), > > > > > >> notebooks that took me several minutes to re-index are now > > > > > >> taking multiple days of time, and it seems like an > > > > > >> exponential slowdown with the number (and maybe size) of > > > > > >> files under the notebook root directory. Has anyone else > > > > > >> experienced this? > > > > > >> > > > > > >> > > > > > >> ___ > > > > > >> Mailing list:https://launchpad.net/~zim-wiki > > > > > >> Post to :zim-wiki@lists.launchpad.net > > > > > >> Unsubscribe
Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2
Would you mind pointing me to the source file(s) that manage this indexing? I'd like to see if there is any way to speed the process up for large numbers of files. On Mon, 3 Jul 2017 17:42:07 +wrote: > Yes, for medium sized notebooks, and those with a "normal" amount of > files, indexing is still under 5 minutes. I also use Zim to manage a > notebook under which there are lots of small work data files > (>350,000). > > The progress bar suggests there is some part of the parsing process > that slows down over time, as does a cursory check on the contents of > the database updating over time. There are many more files added > within the first few minutes, and many fewer over time, such that > after a while, only one or two files are added ever several minutes. > It suggests to me that the whole list is being re-processed or > re-opened as part of the indexing loop, perhaps re-opening the sqlite > file for every new file or something. Ultimately, I don't think that > exponential slowdown is a necessity, but I have not had a free moment > to familiarize myself with the source yet. > > Thanks! > > > > On Mon, 03 Jul 2017 08:04:36 + > Jaap Karssenberg wrote: > > > Yes, zim does indeed now build a tabel of all files in the notebook > > folder, not just text files. However it doesn't access them, it just > > stores file names and mtime. > > > > Despite this change, the indexing is faster than with 0.65 in most > > of my test cases. The behavior you describe suggest a huge amount of > > files under the notebook folder, is this the case? > > > > -- Jaap > > > > On Sun, Jul 2, 2017 at 8:43 PM wrote: > > > > > The notebooks that used to take me about 5 minutes to re-index are > > > taking close to 40 hours for me (they are larger notebooks). > > > > > > It looks like the sql database is indexing every file under the > > > root directory of the notebook, even those not associated with Zim > > > directly, like zip or data files. I'm not sure if that was > > > happening with earlier versions. > > > > > > > > > > > > On Sat, 1 Jul 2017 23:32:16 +0200 > > > Olivier Boesch wrote: > > > > > > > 6 minutes to reindex. pretty long in comparison with the 0.65. > > > > > > > > > > > > Le 01/07/2017 à 23:24, Olivier Boesch a écrit : > > > > > > > > > > I seem to experience the same issue... > > > > > > > > > > I clicked the "cancel" button after several minutes... > > > > > > > > > > testing now how long it takes to re-index... > > > > > > > > > > > > > > > Le 01/07/2017 à 23:04, hawk...@bitmessage.ch a écrit : > > > > >> After this latest upgrade came through (it looks great), > > > > >> notebooks that took me several minutes to re-index are now > > > > >> taking multiple days of time, and it seems like an > > > > >> exponential slowdown with the number (and maybe size) of > > > > >> files under the notebook root directory. Has anyone else > > > > >> experienced this? > > > > >> > > > > >> > > > > >> ___ > > > > >> Mailing list:https://launchpad.net/~zim-wiki > > > > >> Post to :zim-wiki@lists.launchpad.net > > > > >> Unsubscribe :https://launchpad.net/~zim-wiki > > > > >> More help :https://help.launchpad.net/ListHelp > > > > > > > > > > > > > > > > > > > > ___ > > > > > Mailing list: https://launchpad.net/~zim-wiki > > > > > Post to : zim-wiki@lists.launchpad.net > > > > > Unsubscribe : https://launchpad.net/~zim-wiki > > > > > More help : https://help.launchpad.net/ListHelp > > > > > > > > > > > > > ___ > > > Mailing list: https://launchpad.net/~zim-wiki > > > Post to : zim-wiki@lists.launchpad.net > > > Unsubscribe : https://launchpad.net/~zim-wiki > > > More help : https://help.launchpad.net/ListHelp > > > > > > ___ > Mailing list: https://launchpad.net/~zim-wiki > Post to : zim-wiki@lists.launchpad.net > Unsubscribe : https://launchpad.net/~zim-wiki > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~zim-wiki Post to : zim-wiki@lists.launchpad.net Unsubscribe : https://launchpad.net/~zim-wiki More help : https://help.launchpad.net/ListHelp
Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2
Yes, zim does indeed now build a tabel of all files in the notebook folder, not just text files. However it doesn't access them, it just stores file names and mtime. Despite this change, the indexing is faster than with 0.65 in most of my test cases. The behavior you describe suggest a huge amount of files under the notebook folder, is this the case? -- Jaap On Sun, Jul 2, 2017 at 8:43 PMwrote: > The notebooks that used to take me about 5 minutes to re-index are > taking close to 40 hours for me (they are larger notebooks). > > It looks like the sql database is indexing every file under the root > directory of the notebook, even those not associated with Zim > directly, like zip or data files. I'm not sure if that was happening > with earlier versions. > > > > On Sat, 1 Jul 2017 23:32:16 +0200 > Olivier Boesch wrote: > > > 6 minutes to reindex. pretty long in comparison with the 0.65. > > > > > > Le 01/07/2017 à 23:24, Olivier Boesch a écrit : > > > > > > I seem to experience the same issue... > > > > > > I clicked the "cancel" button after several minutes... > > > > > > testing now how long it takes to re-index... > > > > > > > > > Le 01/07/2017 à 23:04, hawk...@bitmessage.ch a écrit : > > >> After this latest upgrade came through (it looks great), notebooks > > >> that took me several minutes to re-index are now taking multiple > > >> days of time, and it seems like an exponential slowdown with the > > >> number (and maybe size) of files under the notebook root > > >> directory. Has anyone else experienced this? > > >> > > >> > > >> ___ > > >> Mailing list:https://launchpad.net/~zim-wiki > > >> Post to :zim-wiki@lists.launchpad.net > > >> Unsubscribe :https://launchpad.net/~zim-wiki > > >> More help :https://help.launchpad.net/ListHelp > > > > > > > > > > > > ___ > > > Mailing list: https://launchpad.net/~zim-wiki > > > Post to : zim-wiki@lists.launchpad.net > > > Unsubscribe : https://launchpad.net/~zim-wiki > > > More help : https://help.launchpad.net/ListHelp > > > > > ___ > Mailing list: https://launchpad.net/~zim-wiki > Post to : zim-wiki@lists.launchpad.net > Unsubscribe : https://launchpad.net/~zim-wiki > More help : https://help.launchpad.net/ListHelp > ___ Mailing list: https://launchpad.net/~zim-wiki Post to : zim-wiki@lists.launchpad.net Unsubscribe : https://launchpad.net/~zim-wiki More help : https://help.launchpad.net/ListHelp
Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2
6 minutes to reindex. pretty long in comparison with the 0.65. Le 01/07/2017 à 23:24, Olivier Boesch a écrit : I seem to experience the same issue... I clicked the "cancel" button after several minutes... testing now how long it takes to re-index... Le 01/07/2017 à 23:04, hawk...@bitmessage.ch a écrit : After this latest upgrade came through (it looks great), notebooks that took me several minutes to re-index are now taking multiple days of time, and it seems like an exponential slowdown with the number (and maybe size) of files under the notebook root directory. Has anyone else experienced this? ___ Mailing list:https://launchpad.net/~zim-wiki Post to :zim-wiki@lists.launchpad.net Unsubscribe :https://launchpad.net/~zim-wiki More help :https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~zim-wiki Post to : zim-wiki@lists.launchpad.net Unsubscribe : https://launchpad.net/~zim-wiki More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~zim-wiki Post to : zim-wiki@lists.launchpad.net Unsubscribe : https://launchpad.net/~zim-wiki More help : https://help.launchpad.net/ListHelp
Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2
I seem to experience the same issue... I clicked the "cancel" button after several minutes... testing now how long it takes to re-index... Le 01/07/2017 à 23:04, hawk...@bitmessage.ch a écrit : After this latest upgrade came through (it looks great), notebooks that took me several minutes to re-index are now taking multiple days of time, and it seems like an exponential slowdown with the number (and maybe size) of files under the notebook root directory. Has anyone else experienced this? ___ Mailing list: https://launchpad.net/~zim-wiki Post to : zim-wiki@lists.launchpad.net Unsubscribe : https://launchpad.net/~zim-wiki More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~zim-wiki Post to : zim-wiki@lists.launchpad.net Unsubscribe : https://launchpad.net/~zim-wiki More help : https://help.launchpad.net/ListHelp
[Zim-wiki] Major slowdown in indexing with upgrade to 67rc2
After this latest upgrade came through (it looks great), notebooks that took me several minutes to re-index are now taking multiple days of time, and it seems like an exponential slowdown with the number (and maybe size) of files under the notebook root directory. Has anyone else experienced this? ___ Mailing list: https://launchpad.net/~zim-wiki Post to : zim-wiki@lists.launchpad.net Unsubscribe : https://launchpad.net/~zim-wiki More help : https://help.launchpad.net/ListHelp