Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2

2018-06-18 Thread HawKing
Thank you for the suggestion, and sorry for the delay. I did try
removing each of those, but it looks like they may be part of the
termination condition for building an empty index, and so with either or
both edits, the notebooks just open with no index having being built. 



On Tue, 27 Mar 2018 08:37:06 +
Jaap Karssenberg  wrote:

> You could try removing the "ORDER BY" and/or "path LIKE" pieces of the
> query, see what that does for your performance.
> 
> If this turns out to help, I can split the loop into several while
> yielding in between to make the rest of the application "tick"
> 
> -- Jaap
> 
> 
> On Sun, Mar 25, 2018 at 11:14 PM  wrote:
> 
> > At brief glance, I'm guessing this is the issue:
> >
> >
> > https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/d96b3509890f4c9b9af9119f64b64947337d8da7/zim/notebook/index/files.py
> > line 89
> >
> >  def _update_iter_inner(self, prefix=''):
> > # sort folders before files: first index structure,
> > then contents # this makes e.g. index links more efficient
> > and robust # sort by id to ensure parents are found before children
> > while True:
> > row = self.db.execute(
> > 'SELECT id, path, node_type FROM
> > files' ' WHERE index_status = ? AND path
> > LIKE ?' ' ORDER BY node_type, id',
> > (STATUS_NEED_UPDATE, prefix + '%')
> > ).fetchone()
> >
> > if row:
> > node_id, path, node_type = row
> > #print ">> UPDATE", node_id, path,
> > node_type
> >  else:
> > break
> >
> > It seems like the whole database is being re-loaded and re-ordered
> > again for the import of every single file. As file number in a
> > notebook increases, this per-file database operation seems not to
> > scale linearly, but some much higher order. Something like globbing
> > for the entire notebook-subdirectory structure and then db
> > importing on a loop through that glob would be vastly more
> > efficient for large file numbers.
> >
> >
> >
> > On Sun, 25 Mar 2018 14:30:33 -0500
> >  wrote:
> >  
> > > Would you mind pointing me to the source file(s) that manage this
> > > indexing? I'd like to see if there is any way to speed the process
> > > up for large numbers of files.
> > >
> > >
> > >
> > > On Mon, 3 Jul 2017 17:42:07 +
> > >  wrote:
> > >  
> > > > Yes, for medium sized notebooks, and those with a "normal"
> > > > amount of files, indexing is still under 5 minutes. I also use
> > > > Zim to manage a notebook under which there are lots of small
> > > > work data files (>350,000).  
> > > >
> > > > The progress bar suggests there is some part of the parsing
> > > > process that slows down over time, as does a cursory check on
> > > > the contents of the database updating over time. There are many
> > > > more files added within the first few minutes, and many fewer
> > > > over time, such that after a while, only one or two files are
> > > > added ever several minutes. It suggests to me that the whole
> > > > list is being re-processed or re-opened as part of the indexing
> > > > loop, perhaps re-opening the sqlite file for every new file or
> > > > something. Ultimately, I don't think that exponential slowdown
> > > > is a necessity, but I have not had a free moment to familiarize
> > > > myself with the source yet.
> > > >
> > > > Thanks!
> > > >
> > > >
> > > >
> > > > On Mon, 03 Jul 2017 08:04:36 +
> > > > Jaap Karssenberg  wrote:
> > > >  
> > > > > Yes, zim does indeed now build a tabel of all files in the
> > > > > notebook folder, not just text files. However it doesn't
> > > > > access them, it just stores file names and mtime.
> > > > >
> > > > > Despite this change, the indexing is faster than with 0.65 in
> > > > > most of my test cases. The behavior you describe suggest a
> > > > > huge amount of files under the notebook folder, is this the
> > > > > case?
> > > > >
> > > > > -- Jaap
> > > > >
> > > > > On Sun, Jul 2, 2017 at 8:43 PM  wrote:
> > > > >  
> > > > > > The notebooks that used to take me about 5 minutes to
> > > > > > re-index are taking close to 40 hours for me (they are
> > > > > > larger notebooks).
> > > > > >
> > > > > > It looks like the sql database is indexing every file under
> > > > > > the root directory of the notebook, even those not
> > > > > > associated with Zim directly, like zip or data files. I'm
> > > > > > not sure if that was happening with earlier versions.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sat, 1 Jul 2017 23:32:16 +0200
> > > > > > Olivier Boesch  wrote:
> > > > > >  
> > > > > > > 6 minutes to reindex. pretty long in comparison with the
> > > > > > > 0.65.
> > > > > > >
> > > > > > >
> > > > > > > Le 01/07/2017 à 23:24, Olivier Boesch a écrit :  
> > > > > > > >

Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2

2018-03-25 Thread HawKing
At brief glance, I'm guessing this is the issue:

https://github.com/jaap-karssenberg/zim-desktop-wiki/blob/d96b3509890f4c9b9af9119f64b64947337d8da7/zim/notebook/index/files.py
line 89

 def _update_iter_inner(self, prefix=''):
# sort folders before files: first index structure,
then contents # this makes e.g. index links more efficient and
robust # sort by id to ensure parents are found before children
while True:
row = self.db.execute(
'SELECT id, path, node_type FROM files'
' WHERE index_status = ? AND path
LIKE ?' ' ORDER BY node_type, id',
(STATUS_NEED_UPDATE, prefix + '%')
).fetchone()

if row:
node_id, path, node_type = row
#print ">> UPDATE", node_id, path,
node_type
 else:
break

It seems like the whole database is being re-loaded and re-ordered
again for the import of every single file. As file number in a notebook
increases, this per-file database operation seems not to scale
linearly, but some much higher order. Something like globbing for the
entire notebook-subdirectory structure and then db importing on a loop
through that glob would be vastly more efficient for large file numbers.



On Sun, 25 Mar 2018 14:30:33 -0500
 wrote:

> Would you mind pointing me to the source file(s) that manage this
> indexing? I'd like to see if there is any way to speed the process
> up for large numbers of files. 
> 
> 
> 
> On Mon, 3 Jul 2017 17:42:07 +
>  wrote:
> 
> > Yes, for medium sized notebooks, and those with a "normal" amount of
> > files, indexing is still under 5 minutes. I also use Zim to manage a
> > notebook under which there are lots of small work data files  
> > (>350,000).   
> > 
> > The progress bar suggests there is some part of the parsing process
> > that slows down over time, as does a cursory check on the contents
> > of the database updating over time. There are many more files added
> > within the first few minutes, and many fewer over time, such that
> > after a while, only one or two files are added ever several minutes.
> > It suggests to me that the whole list is being re-processed or
> > re-opened as part of the indexing loop, perhaps re-opening the
> > sqlite file for every new file or something. Ultimately, I don't
> > think that exponential slowdown is a necessity, but I have not had
> > a free moment to familiarize myself with the source yet. 
> > 
> > Thanks!
> > 
> > 
> > 
> > On Mon, 03 Jul 2017 08:04:36 +
> > Jaap Karssenberg  wrote:
> >   
> > > Yes, zim does indeed now build a tabel of all files in the
> > > notebook folder, not just text files. However it doesn't access
> > > them, it just stores file names and mtime.
> > > 
> > > Despite this change, the indexing is faster than with 0.65 in most
> > > of my test cases. The behavior you describe suggest a huge amount
> > > of files under the notebook folder, is this the case?
> > > 
> > > -- Jaap
> > > 
> > > On Sun, Jul 2, 2017 at 8:43 PM  wrote:
> > > 
> > > > The notebooks that used to take me about 5 minutes to re-index
> > > > are taking close to 40 hours for me (they are larger notebooks).
> > > >
> > > > It looks like the sql database is indexing every file under the
> > > > root directory of the notebook, even those not associated with
> > > > Zim directly, like zip or data files. I'm not sure if that was
> > > > happening with earlier versions.
> > > >
> > > >
> > > >
> > > > On Sat, 1 Jul 2017 23:32:16 +0200
> > > > Olivier Boesch  wrote:
> > > >  
> > > > > 6 minutes to reindex. pretty long in comparison with the 0.65.
> > > > >
> > > > >
> > > > > Le 01/07/2017 à 23:24, Olivier Boesch a écrit :  
> > > > > >
> > > > > > I seem to experience the same issue...
> > > > > >
> > > > > > I clicked the "cancel" button after several minutes...
> > > > > >
> > > > > > testing now how long it takes to re-index...
> > > > > >
> > > > > >
> > > > > > Le 01/07/2017 à 23:04, hawk...@bitmessage.ch a écrit :  
> > > > > >> After this latest upgrade came through (it looks great),
> > > > > >> notebooks that took me several minutes to re-index are now
> > > > > >> taking multiple days of time, and it seems like an
> > > > > >> exponential slowdown with the number (and maybe size) of
> > > > > >> files under the notebook root directory. Has anyone else
> > > > > >> experienced this?
> > > > > >>
> > > > > >>
> > > > > >> ___
> > > > > >> Mailing list:https://launchpad.net/~zim-wiki
> > > > > >> Post to :zim-wiki@lists.launchpad.net
> > > > > >> Unsubscribe 

Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2

2018-03-25 Thread HawKing
Would you mind pointing me to the source file(s) that manage this
indexing? I'd like to see if there is any way to speed the process
up for large numbers of files. 



On Mon, 3 Jul 2017 17:42:07 +
 wrote:

> Yes, for medium sized notebooks, and those with a "normal" amount of
> files, indexing is still under 5 minutes. I also use Zim to manage a
> notebook under which there are lots of small work data files
> (>350,000). 
> 
> The progress bar suggests there is some part of the parsing process
> that slows down over time, as does a cursory check on the contents of
> the database updating over time. There are many more files added
> within the first few minutes, and many fewer over time, such that
> after a while, only one or two files are added ever several minutes.
> It suggests to me that the whole list is being re-processed or
> re-opened as part of the indexing loop, perhaps re-opening the sqlite
> file for every new file or something. Ultimately, I don't think that
> exponential slowdown is a necessity, but I have not had a free moment
> to familiarize myself with the source yet. 
> 
> Thanks!
> 
> 
> 
> On Mon, 03 Jul 2017 08:04:36 +
> Jaap Karssenberg  wrote:
> 
> > Yes, zim does indeed now build a tabel of all files in the notebook
> > folder, not just text files. However it doesn't access them, it just
> > stores file names and mtime.
> > 
> > Despite this change, the indexing is faster than with 0.65 in most
> > of my test cases. The behavior you describe suggest a huge amount of
> > files under the notebook folder, is this the case?
> > 
> > -- Jaap
> > 
> > On Sun, Jul 2, 2017 at 8:43 PM  wrote:
> >   
> > > The notebooks that used to take me about 5 minutes to re-index are
> > > taking close to 40 hours for me (they are larger notebooks).
> > >
> > > It looks like the sql database is indexing every file under the
> > > root directory of the notebook, even those not associated with Zim
> > > directly, like zip or data files. I'm not sure if that was
> > > happening with earlier versions.
> > >
> > >
> > >
> > > On Sat, 1 Jul 2017 23:32:16 +0200
> > > Olivier Boesch  wrote:
> > >
> > > > 6 minutes to reindex. pretty long in comparison with the 0.65.
> > > >
> > > >
> > > > Le 01/07/2017 à 23:24, Olivier Boesch a écrit :
> > > > >
> > > > > I seem to experience the same issue...
> > > > >
> > > > > I clicked the "cancel" button after several minutes...
> > > > >
> > > > > testing now how long it takes to re-index...
> > > > >
> > > > >
> > > > > Le 01/07/2017 à 23:04, hawk...@bitmessage.ch a écrit :
> > > > >> After this latest upgrade came through (it looks great),
> > > > >> notebooks that took me several minutes to re-index are now
> > > > >> taking multiple days of time, and it seems like an
> > > > >> exponential slowdown with the number (and maybe size) of
> > > > >> files under the notebook root directory. Has anyone else
> > > > >> experienced this?
> > > > >>
> > > > >>
> > > > >> ___
> > > > >> Mailing list:https://launchpad.net/~zim-wiki
> > > > >> Post to :zim-wiki@lists.launchpad.net
> > > > >> Unsubscribe :https://launchpad.net/~zim-wiki
> > > > >> More help   :https://help.launchpad.net/ListHelp
> > > > >
> > > > >
> > > > >
> > > > > ___
> > > > > Mailing list: https://launchpad.net/~zim-wiki
> > > > > Post to : zim-wiki@lists.launchpad.net
> > > > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > > > More help   : https://help.launchpad.net/ListHelp
> > > >
> > >
> > >
> > > ___
> > > Mailing list: https://launchpad.net/~zim-wiki
> > > Post to : zim-wiki@lists.launchpad.net
> > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > More help   : https://help.launchpad.net/ListHelp
> > >
> 
> 
> ___
> Mailing list: https://launchpad.net/~zim-wiki
> Post to : zim-wiki@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~zim-wiki
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~zim-wiki
Post to : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp


Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2

2017-07-03 Thread Jaap Karssenberg
Yes, zim does indeed now build a tabel of all files in the notebook folder,
not just text files. However it doesn't access them, it just stores file
names and mtime.

Despite this change, the indexing is faster than with 0.65 in most of my
test cases. The behavior you describe suggest a huge amount of files under
the notebook folder, is this the case?

-- Jaap

On Sun, Jul 2, 2017 at 8:43 PM  wrote:

> The notebooks that used to take me about 5 minutes to re-index are
> taking close to 40 hours for me (they are larger notebooks).
>
> It looks like the sql database is indexing every file under the root
> directory of the notebook, even those not associated with Zim
> directly, like zip or data files. I'm not sure if that was happening
> with earlier versions.
>
>
>
> On Sat, 1 Jul 2017 23:32:16 +0200
> Olivier Boesch  wrote:
>
> > 6 minutes to reindex. pretty long in comparison with the 0.65.
> >
> >
> > Le 01/07/2017 à 23:24, Olivier Boesch a écrit :
> > >
> > > I seem to experience the same issue...
> > >
> > > I clicked the "cancel" button after several minutes...
> > >
> > > testing now how long it takes to re-index...
> > >
> > >
> > > Le 01/07/2017 à 23:04, hawk...@bitmessage.ch a écrit :
> > >> After this latest upgrade came through (it looks great), notebooks
> > >> that took me several minutes to re-index are now taking multiple
> > >> days of time, and it seems like an exponential slowdown with the
> > >> number (and maybe size) of files under the notebook root
> > >> directory. Has anyone else experienced this?
> > >>
> > >>
> > >> ___
> > >> Mailing list:https://launchpad.net/~zim-wiki
> > >> Post to :zim-wiki@lists.launchpad.net
> > >> Unsubscribe :https://launchpad.net/~zim-wiki
> > >> More help   :https://help.launchpad.net/ListHelp
> > >
> > >
> > >
> > > ___
> > > Mailing list: https://launchpad.net/~zim-wiki
> > > Post to : zim-wiki@lists.launchpad.net
> > > Unsubscribe : https://launchpad.net/~zim-wiki
> > > More help   : https://help.launchpad.net/ListHelp
> >
>
>
> ___
> Mailing list: https://launchpad.net/~zim-wiki
> Post to : zim-wiki@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~zim-wiki
> More help   : https://help.launchpad.net/ListHelp
>
___
Mailing list: https://launchpad.net/~zim-wiki
Post to : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp


Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2

2017-07-01 Thread Olivier Boesch

6 minutes to reindex. pretty long in comparison with the 0.65.


Le 01/07/2017 à 23:24, Olivier Boesch a écrit :


I seem to experience the same issue...

I clicked the "cancel" button after several minutes...

testing now how long it takes to re-index...


Le 01/07/2017 à 23:04, hawk...@bitmessage.ch a écrit :

After this latest upgrade came through (it looks great), notebooks that
took me several minutes to re-index are now taking multiple days of
time, and it seems like an exponential slowdown with the number (and
maybe size) of files under the notebook root directory. Has anyone else
experienced this?


___
Mailing list:https://launchpad.net/~zim-wiki
Post to :zim-wiki@lists.launchpad.net
Unsubscribe :https://launchpad.net/~zim-wiki
More help   :https://help.launchpad.net/ListHelp




___
Mailing list: https://launchpad.net/~zim-wiki
Post to : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~zim-wiki
Post to : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp


Re: [Zim-wiki] Major slowdown in indexing with upgrade to 67rc2

2017-07-01 Thread Olivier Boesch

I seem to experience the same issue...

I clicked the "cancel" button after several minutes...

testing now how long it takes to re-index...


Le 01/07/2017 à 23:04, hawk...@bitmessage.ch a écrit :

After this latest upgrade came through (it looks great), notebooks that
took me several minutes to re-index are now taking multiple days of
time, and it seems like an exponential slowdown with the number (and
maybe size) of files under the notebook root directory. Has anyone else
experienced this?


___
Mailing list: https://launchpad.net/~zim-wiki
Post to : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~zim-wiki
Post to : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp


[Zim-wiki] Major slowdown in indexing with upgrade to 67rc2

2017-07-01 Thread HawKing
After this latest upgrade came through (it looks great), notebooks that
took me several minutes to re-index are now taking multiple days of
time, and it seems like an exponential slowdown with the number (and
maybe size) of files under the notebook root directory. Has anyone else
experienced this?


___
Mailing list: https://launchpad.net/~zim-wiki
Post to : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp