Re: [gentoo-user] Nepomuk indexing, what triggers it?
Alan McKinnon writes: > If I reboot this machine and start KDE, Nepomuk starts a rather > long-lived index of my home directory. It takes up about 30-40% cpu > and lasts as much as 15 minutes sometimes. This is annoying because > after a reboot I usually want to catch up on mail, rss feeds and fire > up VirtualBox. So nepomuk is just wasting my time at this point. > > How does nepomuk know when to do it's thing, how can I tweak what it > does and how can I discover why it feels it necessary to reindex my > entire maildir when surely it has a perfectly valid index already from > just before I shut down? I think it starts scanning everything over again at every login. I've been also annoyed by that, so I deactivated it, and activate it from time to time when I am away, so it won't bother me. Or you can have it active, and during login you can suspend Strigi's indexing by right-clicking on the Nepomuk/Strigi icon in the panel. You might be interested in this article that came up on the Planet KDE RSS feed yesterday: http://www.afiestas.org/nepomuk-is-not-fast-is-instant/ It suggests to set fs.inotify.max_user_watches to something quite large like 524288 via sysctl. I assume this is the number of directories being monitored with inotify, and if this is larger than the total number of directories, changes in a directory will be noticed at once. So maybe this will avoid the periodic scanning at all? I did not try this yet. But it won't stop the first scan after login. I think I will have to trim the list of directories to index. Currently, I selected my and another user's $HOME, and some data directories. This gives 666,000 files, which is probably a lot. So I guess I'll skip my MP3s, as they are indexed already by Amarok, and also those many directories with source code. Wonko
Re: [gentoo-user] Nepomuk indexing, what triggers it?
Apparently, though unproven, at 20:13 on Friday 19 November 2010, BRM did opine thusly: > > My guess is that it scans every time you restart to be sure nothing > > changed while it was shutdown. It doesn't know if you've dual-booted, > > logged into xfce, mounted the disk in another machine, had fsck remove > > files, etc. > > > > > > > > I think Tracker behaves the same way in gnome-land. > > To add to it - Nepomuk has two parts (according to > http://nepomuk.kde.org/node/2) that seem to be active in here: > 1. Strigi - > http://techbase.kde.org/Development/Tutorials/Metadata/Nepomuk/StrigiServic > e 2. FileWatchService - > http://techbase.kde.org/Development/Tutorials/Metadata/Nepomuk/FileWatchSer > vice > > From the FileWatchService info: > > "However: due to the restrictions of all file watching systems available > (systems such as inotify are restricted to 8000 something watches, fam > does not support file moving monitoring, etc.) the service mostly relies > on KDirNotify. Thus, all operations performed by KDE applications through > KIO are monitored while all other operations (such as console commands) > are missed." > > So it really does need to check up on things during restart to get back in > sync, but also to find what it didn't know about from info not going > through an interface it is aware of. Well at least that explains the reason for the current state of affairs. Thanks for the find. -- alan dot mckinnon at gmail dot com
Re: [gentoo-user] Nepomuk indexing, what triggers it?
Apparently, though unproven, at 18:31 on Friday 19 November 2010, Paul Hartman did opine thusly: > On Fri, Nov 19, 2010 at 9:17 AM, Alan McKinnon wrote: > > Hi all, > > > > Haven't had much luck finding this info: > > > > If I reboot this machine and start KDE, Nepomuk starts a rather > > long-lived index of my home directory. It takes up about 30-40% cpu and > > lasts as much as 15 minutes sometimes. This is annoying because after a > > reboot I usually want to catch up on mail, rss feeds and fire up > > VirtualBox. So nepomuk is just wasting my time at this point. > > My /guess/ is that it scans every time you restart to be sure nothing > changed while it was shutdown. It doesn't know if you've dual-booted, > logged into xfce, mounted the disk in another machine, had fsck remove > files, etc. > > I think Tracker behaves the same way in gnome-land. I think that's a bit silly, so do a full scan just in case stuff changed. If so, a very simple optimization would be to calculate a hash of some aspect of a directory, store the hash persistently, and only do a full scan if the hash is different. I haven't read the code, so I'm in no real position to know how it's done or how to optimize it. > > How does nepomuk know when to do it's thing, how can I tweak what it does > > and how can I discover why it feels it necessary to reindex my entire > > maildir when surely it has a perfectly valid index already from just > > before I shut down? > > I am pretty sure it is tied to your KDE user session, and not running > as a system daemon in the background. Perhaps you can suspend it via > some autostarting script, and then resume it after whatever amount of > time you're comfortable with. > > Looking in here: > http://api.kde.org/4.5-api/kdebase-runtime-apidocs/nepomuk/html/classNepomu > k_1_1IndexScheduler.html > > In the indexing speed settings, it says: > " > enum Nepomuk::IndexScheduler::IndexingSpeed > > Enumerator: > FullSpeed Index at full speed, i.e. do not use any artificial delays. > This is the mode used if the user is "away". > > ReducedSpeed Reduce the indexing speed mildly. > This is the normal mode used while the user works. The indexer > uses small delay between indexing two files in order to keep the load > on CPU and IO down. > > SnailPace Like ReducedSpeed delays are used but they are much > longer to get even less CPU and IO load. > This mode is used for the first 2 minutes after startup to give > the KDE session manager time to start up the KDE session rapidly. > " > > So based on that, for the first 2 minutes after KDE starts it should > be using the least aggressive indexing speed (but indexing > nevertheless). Good find. Personally, I'd like it to wait for 10-20 minutes after session start, then just run at SnailPace period. This machine is seldom booted or even logged out of KDE (I suspend) so I can tolerate the wait as it's rare > (Personally I've always had all that indexing/social-semantic-desktop > stuff disabled completely.) Maybe I should too. But I *did* want to use this nepomuk thing myself for a while and see what the semantic-desktop can do for myself. It looks like it could be awesomely useful (like Google turned out to be awesomely useful) but it takes usage for real to know -- alan dot mckinnon at gmail dot com
Re: [gentoo-user] Nepomuk indexing, what triggers it?
- Original Message > From: Paul Hartman > To: gentoo-user@lists.gentoo.org > Sent: Fri, November 19, 2010 11:31:39 AM > Subject: Re: [gentoo-user] Nepomuk indexing, what triggers it? > > On Fri, Nov 19, 2010 at 9:17 AM, Alan McKinnon >wrote: > > Hi all, > > > > Haven't had much luck finding this info: > > > > If I reboot this machine and start KDE, Nepomuk starts a rather long-lived > > index of my home directory. It takes up about 30-40% cpu and lasts as much >as > > 15 minutes sometimes. This is annoying because after a reboot I usually want > > to catch up on mail, rss feeds and fire up VirtualBox. So nepomuk is just > > wasting my time at this point. > > My /guess/ is that it scans every time you restart to be sure nothing > changed while it was shutdown. It doesn't know if you've dual-booted, > logged into xfce, mounted the disk in another machine, had fsck remove > files, etc. > > I think Tracker behaves the same way in gnome-land. To add to it - Nepomuk has two parts (according to http://nepomuk.kde.org/node/2) that seem to be active in here: 1. Strigi - http://techbase.kde.org/Development/Tutorials/Metadata/Nepomuk/StrigiService 2. FileWatchService - http://techbase.kde.org/Development/Tutorials/Metadata/Nepomuk/FileWatchService >From the FileWatchService info: "However: due to the restrictions of all file watching systems available (systems such as inotify are restricted to 8000 something watches, fam does not support file moving monitoring, etc.) the service mostly relies on KDirNotify. Thus, all operations performed by KDE applications through KIO are monitored while all other operations (such as console commands) are missed." So it really does need to check up on things during restart to get back in sync, but also to find what it didn't know about from info not going through an interface it is aware of. Ben
Re: [gentoo-user] Nepomuk indexing, what triggers it?
On Fri, Nov 19, 2010 at 9:17 AM, Alan McKinnon wrote: > Hi all, > > Haven't had much luck finding this info: > > If I reboot this machine and start KDE, Nepomuk starts a rather long-lived > index of my home directory. It takes up about 30-40% cpu and lasts as much as > 15 minutes sometimes. This is annoying because after a reboot I usually want > to catch up on mail, rss feeds and fire up VirtualBox. So nepomuk is just > wasting my time at this point. My /guess/ is that it scans every time you restart to be sure nothing changed while it was shutdown. It doesn't know if you've dual-booted, logged into xfce, mounted the disk in another machine, had fsck remove files, etc. I think Tracker behaves the same way in gnome-land. > How does nepomuk know when to do it's thing, how can I tweak what it does and > how can I discover why it feels it necessary to reindex my entire maildir when > surely it has a perfectly valid index already from just before I shut down? I am pretty sure it is tied to your KDE user session, and not running as a system daemon in the background. Perhaps you can suspend it via some autostarting script, and then resume it after whatever amount of time you're comfortable with. Looking in here: http://api.kde.org/4.5-api/kdebase-runtime-apidocs/nepomuk/html/classNepomuk_1_1IndexScheduler.html In the indexing speed settings, it says: " enum Nepomuk::IndexScheduler::IndexingSpeed Enumerator: FullSpeed Index at full speed, i.e. do not use any artificial delays. This is the mode used if the user is "away". ReducedSpeedReduce the indexing speed mildly. This is the normal mode used while the user works. The indexer uses small delay between indexing two files in order to keep the load on CPU and IO down. SnailPace Like ReducedSpeed delays are used but they are much longer to get even less CPU and IO load. This mode is used for the first 2 minutes after startup to give the KDE session manager time to start up the KDE session rapidly. " So based on that, for the first 2 minutes after KDE starts it should be using the least aggressive indexing speed (but indexing nevertheless). (Personally I've always had all that indexing/social-semantic-desktop stuff disabled completely.)
[gentoo-user] Nepomuk indexing, what triggers it?
Hi all, Haven't had much luck finding this info: If I reboot this machine and start KDE, Nepomuk starts a rather long-lived index of my home directory. It takes up about 30-40% cpu and lasts as much as 15 minutes sometimes. This is annoying because after a reboot I usually want to catch up on mail, rss feeds and fire up VirtualBox. So nepomuk is just wasting my time at this point. How does nepomuk know when to do it's thing, how can I tweak what it does and how can I discover why it feels it necessary to reindex my entire maildir when surely it has a perfectly valid index already from just before I shut down? Strigi is also enabled if that's relevant to the question. -- alan dot mckinnon at gmail dot com