Re: [gentoo-user] Nepomuk indexing, what triggers it?

2010-11-19 Thread Alex Schuster
Alan McKinnon writes:

> If I reboot this machine and start KDE, Nepomuk starts a rather
> long-lived index of my home directory. It takes up about 30-40% cpu
> and lasts as much as 15 minutes sometimes. This is annoying because
> after a reboot I usually want to catch up on mail, rss feeds and fire
> up VirtualBox. So nepomuk is just wasting my time at this point.
> 
> How does nepomuk know when to do it's thing, how can I tweak what it
> does and how can I discover why it feels it necessary to reindex my
> entire maildir when surely it has a perfectly valid index already from
> just before I shut down?

I think it starts scanning everything over again at every login. I've been 
also annoyed by that, so I deactivated it, and activate it from time to 
time when I am away, so it won't bother me.
Or you can have it active, and during login you can suspend Strigi's 
indexing by right-clicking on the Nepomuk/Strigi icon in the panel.

You might be interested in this article that came up on the Planet KDE RSS 
feed yesterday:
http://www.afiestas.org/nepomuk-is-not-fast-is-instant/

It suggests to set fs.inotify.max_user_watches to something quite large 
like 524288 via sysctl. I assume this is the number of directories being 
monitored with inotify, and if this is larger than the total number of 
directories, changes in a directory will be noticed at once. So maybe this 
will avoid the periodic scanning at all? I did not try this yet. But it 
won't stop the first scan after login.

I think I will have to trim the list of directories to index. Currently, I 
selected my and another user's $HOME, and some data directories. This 
gives 666,000 files, which is probably a lot. So I guess I'll skip my 
MP3s, as they are indexed already by Amarok, and also those many 
directories with source code.

Wonko



Re: [gentoo-user] Nepomuk indexing, what triggers it?

2010-11-19 Thread Alan McKinnon
Apparently, though unproven, at 20:13 on Friday 19 November 2010, BRM did 
opine thusly:

> > My  guess is that it scans every time you restart to be sure nothing
> > changed  while it was shutdown. It doesn't know if you've dual-booted,
> > logged into  xfce, mounted the disk in another machine, had fsck remove
> > files,  etc.
> >
> > 
> >
> > I think Tracker behaves the same way in gnome-land.
> 
> To add to it - Nepomuk has two parts (according to 
> http://nepomuk.kde.org/node/2) that seem to be active in here:
> 1. Strigi - 
> http://techbase.kde.org/Development/Tutorials/Metadata/Nepomuk/StrigiServic
> e 2. FileWatchService -
> http://techbase.kde.org/Development/Tutorials/Metadata/Nepomuk/FileWatchSer
> vice
> 
> From the FileWatchService info:
> 
> "However: due to the restrictions of all file watching systems  available 
> (systems such as inotify are restricted to 8000 something  watches, fam
> does not  support file moving monitoring, etc.) the service  mostly relies
> on KDirNotify. Thus, all operations performed by KDE  applications through
> KIO are monitored while all other operations (such  as console commands)
> are missed."
> 
> So it really does need to check up on things during restart to get back in
> sync,  but also to find what it didn't know about from info not going
> through an interface it is aware of.

Well at least that explains the reason for the current state of affairs. 
Thanks for the find.


-- 
alan dot mckinnon at gmail dot com



Re: [gentoo-user] Nepomuk indexing, what triggers it?

2010-11-19 Thread Alan McKinnon
Apparently, though unproven, at 18:31 on Friday 19 November 2010, Paul Hartman 
did opine thusly:

> On Fri, Nov 19, 2010 at 9:17 AM, Alan McKinnon  
wrote:
> > Hi all,
> > 
> > Haven't had much luck finding this info:
> > 
> > If I reboot this machine and start KDE, Nepomuk starts a rather
> > long-lived index of my home directory. It takes up about 30-40% cpu and
> > lasts as much as 15 minutes sometimes. This is annoying because after a
> > reboot I usually want to catch up on mail, rss feeds and fire up
> > VirtualBox. So nepomuk is just wasting my time at this point.
> 
> My /guess/ is that it scans every time you restart to be sure nothing
> changed while it was shutdown. It doesn't know if you've dual-booted,
> logged into xfce, mounted the disk in another machine, had fsck remove
> files, etc.
> 
> I think Tracker behaves the same way in gnome-land.

I think that's a bit silly, so do a full scan just in case stuff changed.

If so, a very simple optimization would be to calculate a hash of some aspect 
of a directory, store the hash persistently, and only do a full scan if the 
hash is different.

I haven't read the code, so I'm in no real position to know how it's done or 
how to optimize it.

> > How does nepomuk know when to do it's thing, how can I tweak what it does
> > and how can I discover why it feels it necessary to reindex my entire
> > maildir when surely it has a perfectly valid index already from just
> > before I shut down?
> 
> I am pretty sure it is tied to your KDE user session, and not running
> as a system daemon in the background. Perhaps you can suspend it via
> some autostarting script, and then resume it after whatever amount of
> time you're comfortable with.
> 
> Looking in here:
> http://api.kde.org/4.5-api/kdebase-runtime-apidocs/nepomuk/html/classNepomu
> k_1_1IndexScheduler.html
> 
> In the indexing speed settings, it says:
> "
> enum Nepomuk::IndexScheduler::IndexingSpeed
> 
> Enumerator:
> FullSpeed Index at full speed, i.e. do not use any artificial 
delays.
> This is the mode used if the user is "away".
> 
> ReducedSpeed  Reduce the indexing speed mildly.
> This is the normal mode used while the user works. The indexer
> uses small delay between indexing two files in order to keep the load
> on CPU and IO down.
> 
> SnailPace Like ReducedSpeed delays are used but they are much
> longer to get even less CPU and IO load.
> This mode is used for the first 2 minutes after startup to give
> the KDE session manager time to start up the KDE session rapidly.
> "
> 
> So based on that, for the first 2 minutes after KDE starts it should
> be using the least aggressive indexing speed (but indexing
> nevertheless).

Good find. Personally, I'd like it to wait for 10-20 minutes after session 
start, then just run at SnailPace period. This machine is seldom booted or 
even logged out of KDE (I suspend) so I can tolerate the wait as it's rare

> (Personally I've always had all that indexing/social-semantic-desktop
> stuff disabled completely.)

Maybe I should too. But I *did* want to use this nepomuk thing myself for a 
while and see what the semantic-desktop can do for myself. It looks like it 
could be awesomely useful (like Google turned out to be awesomely useful) but 
it takes usage for real to know




-- 
alan dot mckinnon at gmail dot com



Re: [gentoo-user] Nepomuk indexing, what triggers it?

2010-11-19 Thread BRM
- Original Message 

> From: Paul Hartman 
> To: gentoo-user@lists.gentoo.org
> Sent: Fri, November 19, 2010 11:31:39 AM
> Subject: Re: [gentoo-user] Nepomuk indexing, what triggers it?
> 
> On Fri, Nov 19, 2010 at 9:17 AM, Alan McKinnon   
>wrote:
> > Hi all,
> >
> > Haven't had much luck finding this  info:
> >
> > If I reboot this machine and start KDE, Nepomuk starts a  rather long-lived
> > index of my home directory. It takes up about 30-40%  cpu and lasts as much 
>as
> > 15 minutes sometimes. This is annoying because  after a reboot I usually 
want
> > to catch up on mail, rss feeds and fire up  VirtualBox. So nepomuk is just
> > wasting my time at this point.
> 
> My  /guess/ is that it scans every time you restart to be sure nothing
> changed  while it was shutdown. It doesn't know if you've dual-booted,
> logged into  xfce, mounted the disk in another machine, had fsck remove
> files,  etc.
> 
> I think Tracker behaves the same way in gnome-land.

To add to it - Nepomuk has two parts (according to 
http://nepomuk.kde.org/node/2) that seem to be active in here:
1. Strigi - 
http://techbase.kde.org/Development/Tutorials/Metadata/Nepomuk/StrigiService
2. FileWatchService - 
http://techbase.kde.org/Development/Tutorials/Metadata/Nepomuk/FileWatchService

>From the FileWatchService info:

"However: due to the restrictions of all file watching systems  available 
(systems such as inotify are restricted to 8000 something  watches, fam does 
not 
support file moving monitoring, etc.) the service  mostly relies on KDirNotify. 
Thus, all operations performed by KDE  applications through KIO are monitored 
while all other operations (such  as console commands) are missed."

So it really does need to check up on things during restart to get back in 
sync, 
but also to find what it didn't know about from info not going through an 
interface it is aware of.

Ben




Re: [gentoo-user] Nepomuk indexing, what triggers it?

2010-11-19 Thread Paul Hartman
On Fri, Nov 19, 2010 at 9:17 AM, Alan McKinnon  wrote:
> Hi all,
>
> Haven't had much luck finding this info:
>
> If I reboot this machine and start KDE, Nepomuk starts a rather long-lived
> index of my home directory. It takes up about 30-40% cpu and lasts as much as
> 15 minutes sometimes. This is annoying because after a reboot I usually want
> to catch up on mail, rss feeds and fire up VirtualBox. So nepomuk is just
> wasting my time at this point.

My /guess/ is that it scans every time you restart to be sure nothing
changed while it was shutdown. It doesn't know if you've dual-booted,
logged into xfce, mounted the disk in another machine, had fsck remove
files, etc.

I think Tracker behaves the same way in gnome-land.

> How does nepomuk know when to do it's thing, how can I tweak what it does and
> how can I discover why it feels it necessary to reindex my entire maildir when
> surely it has a perfectly valid index already from just before I shut down?

I am pretty sure it is tied to your KDE user session, and not running
as a system daemon in the background. Perhaps you can suspend it via
some autostarting script, and then resume it after whatever amount of
time you're comfortable with.

Looking in here:
http://api.kde.org/4.5-api/kdebase-runtime-apidocs/nepomuk/html/classNepomuk_1_1IndexScheduler.html

In the indexing speed settings, it says:
"
enum Nepomuk::IndexScheduler::IndexingSpeed

Enumerator:
FullSpeed   Index at full speed, i.e. do not use any artificial delays.
This is the mode used if the user is "away".

ReducedSpeedReduce the indexing speed mildly.
This is the normal mode used while the user works. The indexer
uses small delay between indexing two files in order to keep the load
on CPU and IO down.

SnailPace   Like ReducedSpeed delays are used but they are much
longer to get even less CPU and IO load.
This mode is used for the first 2 minutes after startup to give
the KDE session manager time to start up the KDE session rapidly.
"

So based on that, for the first 2 minutes after KDE starts it should
be using the least aggressive indexing speed (but indexing
nevertheless).

(Personally I've always had all that indexing/social-semantic-desktop
stuff disabled completely.)



[gentoo-user] Nepomuk indexing, what triggers it?

2010-11-19 Thread Alan McKinnon
Hi all,

Haven't had much luck finding this info:

If I reboot this machine and start KDE, Nepomuk starts a rather long-lived 
index of my home directory. It takes up about 30-40% cpu and lasts as much as 
15 minutes sometimes. This is annoying because after a reboot I usually want 
to catch up on mail, rss feeds and fire up VirtualBox. So nepomuk is just 
wasting my time at this point.

How does nepomuk know when to do it's thing, how can I tweak what it does and 
how can I discover why it feels it necessary to reindex my entire maildir when 
surely it has a perfectly valid index already from just before I shut down?

Strigi is also enabled if that's relevant to the question.


-- 
alan dot mckinnon at gmail dot com