[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2022-07-15 Thread soredake
https://bugs.kde.org/show_bug.cgi?id=400704

soredake  changed:

   What|Removed |Added

 CC||ndrzj1...@relay.firefox.com

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2022-03-23 Thread Adam Fontenot
https://bugs.kde.org/show_bug.cgi?id=400704

Adam Fontenot  changed:

   What|Removed |Added

 CC||adam.m.fontenot+kde@gmail.c
   ||om

--- Comment #43 from Adam Fontenot  ---
(In reply to tagwerk19 from comment #38)
> It would make sense to have time/memory limits for such actions (and
> flag the file as
> "failed" if the extraction exceeds them).

Was thinking about this and similar IO problems, and decided to have a look at
how Gnome's "tracker" is handling things these days. Going to document my
findings here in the hope it's useful as inspiration for how we might handle
similar problems. I think it's an important point of comparison for Baloo.

I have mostly positive things to say, although Tracker also has some flaws (it
didn't pick up my XDG Documents folder by default, it didn't index the contents
of files with text/plain mimetimes that don't have file extensions, and it uses
a large amount of CPU while searching in Nautilus).

 * I enabled Tracker to index my home folder (with content indexing) and it
uses 474 MB on my $HOME. I've completely disabled content indexing for Baloo,
but it's somehow using 1.4 GB. Suffice it to say that Baloo is weirdly
inefficient. (ContentIndexingDB is empty, so it's not old content indexes.)
More research needed here, any suggestions appreciated.

 * Unlike Baloo, Tracker does not hang when given pathological files. (See the
link in tagwerk19's comment for an example.) I get a very sensible "Crash/hang
handling file" message in the log for this file and it's otherwise ignored.
Among other checks, they appear to kill the process if the content indexer
takes more than 30 seconds on a file, which seems quite reasonable:
https://gitlab.gnome.org/GNOME/tracker-miners/-/blob/master/src/tracker-extract/tracker-extract.c

 * They have some cool features around full text search including unaccenting
and case folding, and use SPARQL for queries:
https://wiki.gnome.org/Projects/Tracker/Features I haven't seen enough
documentation from Baloo to know how we stack up there.

 * Tracker and Baloo both blacklist source code files by default, among several
other types. Baloo doesn't expose this to the user in the UI, which I think
might surprise some users who expect more configurability from KDE.

 * Tracker seems not to be very configurable. There's a bit of under the hood
adjustment possible, but mostly the focus seems to be on having good heuristics
out of the box. I don't think we could trivially swap Tracker for Baloo and
having everything we need work. We'll need to keep improving Baloo. :-)

This comment might be better off on the Wiki somewhere, but it seems pretty
underutilized and I'm not sure where I'd put it or if anyone would even read it
there.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-11-25 Thread Mircea Kitsune
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #42 from Mircea Kitsune  ---
(In reply to tagwerk19 from comment #41)

Yeah 100MB sounds like a good default limit for all files. I'd make it an
option in the search settings of course, users should be able to customize this
based on the amount of files they have and the power of their computer.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-11-25 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #41 from tagwer...@innerjoin.org ---
(In reply to Martin Steigerwald from comment #40)
> ... go for the
> first 100 KiB and ignore the rest, instead of just indexing the file name in
> such cases...
At the moment there's a 10MByte limit for text or html:
https://bugs.kde.org/show_bug.cgi?id=410680#c7
Personal preference would be that the first 10MB is indexed and the rest
ignored but it seems that if the file is more (more or less more) than 10MB
it's not indexed.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-11-25 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #40 from Martin Steigerwald  ---
Wow, kudos for the new website design for Bugzilla.

It could also be a limit up to which files would be indexed. I.e. go for the
first 100 KiB and ignore the rest, instead of just indexing the file name in
such cases. Not sure whether it is worth to do it this way. IMHO it depends on
the type of file. For a lot of file formats for larger files it would only make
sense to index metadata, like for video or sound or image files. I think and
hope that Baloo is already doing this.

Other large files are archives like tarballs or ZIP files.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-11-25 Thread soredake
https://bugs.kde.org/show_bug.cgi?id=400704

soredake  changed:

   What|Removed |Added

 CC|ndrzj1...@relay.firefox.com |

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-11-25 Thread Mircea Kitsune
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #39 from Mircea Kitsune  ---
(In reply to tagwerk19 from comment #38)

+1 on that idea. Dolphin actually has a file size limit for generating
thumbnails, in Manjaro you need to manually remove it or most images won't
generate thumbnails at all. It would be more than logical to have something
like this for Baloo, indicating a size limit past which a file's contents will
not be indexed (only its name and location). Thanks for this suggestion.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-11-25 Thread sourcemaker
https://bugs.kde.org/show_bug.cgi?id=400704

sourcemaker  changed:

   What|Removed |Added

 Blocks||446071


Referenced Bugs:

https://bugs.kde.org/show_bug.cgi?id=446071
[Bug 446071] Baloo is currently not usable (performance problems)
-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-11-23 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #38 from tagwer...@innerjoin.org ---
Another reference "for completeness":

8...
baloo_file_extractor can get caught on files that require hours to index,
the example case
being a PDF containing a scientific plot. The plot itself is compressed
data with little
indexable content and unpacking it may require more RAM than you have
available

See https://bugs.kde.org/show_bug.cgi?id=380456#c21

It's possible that such indexing attempts trigger OoM protections and
therefore never complete.

It would make sense to have time/memory limits for such actions (and flag
the file as
"failed" if the extraction exceeds them).

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-10-11 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #37 from pie...@e-delman.com ---
(In reply to tagwerk19 from comment #35)
> $ baloosearch -i filename:"one of your files"
> and you get multiple results with different ID's. Check the file itself
> $ stat "one of your files"
Hi,
Just 1 file but chosen at random. Actually, this file might not have been
"baloo-ed" before I killed baloo-file-extractor. No way to find out but through
sample polling files and testing them the way you suggest, isn't it ? (way
beyond my ability)

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-10-11 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #36 from tagwer...@innerjoin.org ---
One more observation for the collection.

It may be that "spike loads" in memory usage trigger OOM protection and
baloo_file_extractor and baloo_file are killed.

Tangentially observed in Fedora 35:
https://bugs.kde.org/show_bug.cgi?id=443547#c2
but needs a closer look...

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-10-11 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #35 from tagwer...@innerjoin.org ---
(In reply to pierre from comment #34)
> The slow down appears as I had just upgraded to 20.04 LTS ans I remember
> that I had the same problem 3 years ago after upgrading to 18.04. 
You would get a reindexing if the device number of your discs changed. You can
see if that has happened if you run
$ baloosearch -i filename:"one of your files"
and you get multiple results with different ID's. Check the file itself
$ stat "one of your files"
and compare the device details:
Device: fc01h/64513dInode: 1053347 Links: 1

Beyond that, I'm not sure. I don't remember having met the issue.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-10-05 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=400704

pie...@e-delman.com changed:

   What|Removed |Added

 CC||pie...@e-delman.com

--- Comment #34 from pie...@e-delman.com ---
Hi,
One comment I have not seen in the long list since 2014 :
The slow down appears as I had just upgraded to 20.04 LTS ans I remember that I
had the same problem 3 years ago after upgrading to 18.04. So I had a day or
two leaving the computer on so it would get over indexing (during a weekend)
Wouldn't it be nice if the database was left as it is while upgrading ?

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-09-24 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #33 from tagwer...@innerjoin.org ---
(In reply to tagwerk19 from comment #31)
> Consider this as a "Where are we?" summary; an attempt to collect together
> different threads and weave in new evidence.
Weaving in a couple of extra references "for completeness":

5...
Removing baloo records for deleted files seems to be slow
(more I/O intensive than the original indexing). See Bug 442453

6...
Running a "balooctl status" while baloo is removing records for
deleted files, causes memory consumption and index size to
balloon, Bug 437754

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-09-04 Thread Mircea Kitsune
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #32 from Mircea Kitsune  ---
The issue seems to have gotten somewhat better at this day, especially with the
latest Plasma version 5.22. Though I've since moved to using an SSD / NVME
drive, might be why disk sleep isn't as bad as it used to be during indexing.

Another issue now seems to be the baloo processes are using more memory than I
wish they did, based on the amount of files it indexed. If anyone has a large
HDD but not enough RAM, they'll need to blacklist every large directory.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-09-04 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #31 from tagwer...@innerjoin.org ---
It could be that there are several different issues being "bundled together".

1...

There are, for example, problems with openSUSE that runs BTRFS
with multiple subvols, check with finding one of the files indexed
and trying the following...

stat testfile
balooshow -x testfile 

and

baloosearch -i filename:testfile 

The "stat" would give you the device and inode number of the file.
You should see the same numbers listed in the "balooshow -x"
results. See:

https://bugs.kde.org/show_bug.cgi?id=402154#c12

If the device/inode numbers change for a file, baloo will think it
is a different file and index it again. You can see this evidenced
in the "baloosearch -i" results, you could get multiple results
(different ID's; same file)

2...

Repeated spike loads at logon. In cases where there are *very* *many*
new files, even if content indexing is disabled, the initial scan by
baloo_file takes too many resources,

My reading of the behaviour is that baloo_file does not "batch up"
updates to the index as it discovers new/changed/deleted files.
There's therefore no hint (looking at "balooctl status") that there's
any progress being made, it may be that the indexing if "Idle" as
just an initial scan is being done (and not content indexing) and
the RAM used by baloo_file can grow steadily (potentially extending
to swap space).

As per Bug 394750:

https://bugs.kde.org/show_bug.cgi?id=394750#c13

If the updates from an "initial scan" are done as a single transaction
there are no checkpoints. Killing the process and starting again,
rebooting or logging out and back in again will start "from scratch".

Bug 428416 is also interesting in terms of what baloo_file is doing
when it deals with a large indexing run.

3...

It seems likely that with baloo reindexing files as they reappear
with different ID's (as per '1' above) the index size balloons;
on disc and in terms of pages pulled into memory. This will
compound issue '2'.

4...

On a positive note, the impact (as seen by the user) of a sync of
the dirty pages to disc could be manageable if the index is on
an SSD

Comment 19 argues against increasing the batch size (that the data
will have to be written at some time). This would hammer HDD users
but maybe have has less impact on SSD users.

With an SSD, there's the counter argument that you want to avoid
frequent rewrites to prolong the life of the disc. Gut feeling is
that with a larger batch size, the data written to disc is less
in total.

Wishlist/Proposals/Suggestions

I think baloo needs to "batch up" its transactions in its initial scan.
If I were to suggest "how often", I'd pick a time interval, maybe
every 15 or 30 seconds.

It would be nice to have a "balooctl" option (or a setting within
baloofilerc) to tune the batch size used for baloo_file_extractor.
That would make it possible to do indexing comparisons "in the
real world"

Consider this as a "Where are we?" summary; an attempt to collect together
different threads and weave in new evidence.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-08-02 Thread Mircea Kitsune
https://bugs.kde.org/show_bug.cgi?id=400704

Mircea Kitsune  changed:

   What|Removed |Added

 CC||sonichedgehog_hyperblast00@
   ||yahoo.com

--- Comment #30 from Mircea Kitsune  ---
A very real and annoying issue. I've kept Baloo disabled for years now, due to
it putting my hard drive in "disk sleep" and causing processes on the system to
freeze while waiting for drive access. Nowadays I have a different HDD setup so
I managed to enable it with some directories blacklisted. Still eats more RAM
than it should... if it's not drive I/O it's gonna be the memory or CPU.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-06-11 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=400704

tagwer...@innerjoin.org changed:

   What|Removed |Added

 CC||tagwer...@innerjoin.org

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2021-02-16 Thread soredake
https://bugs.kde.org/show_bug.cgi?id=400704

soredake  changed:

   What|Removed |Added

 CC||ndrzj1...@relay.firefox.com

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2020-03-19 Thread Oded Arbel
https://bugs.kde.org/show_bug.cgi?id=400704

Oded Arbel  changed:

   What|Removed |Added

 CC||o...@geek.co.il

--- Comment #29 from Oded Arbel  ---
Same problem here - Baloo eats up all IO even when reporting "idle". This has
become a problem only in the last year or so. I'm using a pretty beefy i7
device with an NVME and while Baloo is enabled the computer often is slow and
freezes from time to time. Looking at CPU usage I see `baloo_file` takes
50%~80% CPU, and loadavg is around 3~4.5 (on 4 core system).

Looking at IO:
---8<---
$ pidstat -G balo[o] -dl 5 1; balooctl status
Linux 5.4.0-17-generic (vesho)  03/20/2020  _x86_64_(8 CPU)

12:48:52 AM   UID   PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
12:48:57 AM  1000 80482  26164.94  64858.96  0.00 170 
/usr/bin/baloo_file 

Average:  UID   PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
Average: 1000 80482  26164.94  64858.96  0.00 170 
/usr/bin/baloo_file 
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 297,288
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 2.56 GiB
8<

So balooctl reports "Idle" while baloo_file pushes >60Mbit/sec to the drive and
does not insignificant reading.

In .xsession-errors log I can see a lot of messages like this:

8<
org.kde.baloo.engine: DocumentDB::get 307907124573241397 MDB_NOTFOUND: No
matching key/data pair found
8<

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-12-17 Thread Øystein Steffensen-Alværvik
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #28 from Øystein Steffensen-Alværvik  ---
This happens both if Baloo is index file *contents*, and when it's just
indexing info on files. I have to turn indexing completely off, if not my
computer becomes practically unusable upon every power on. 
This is new and I've never had trouble with Baloo on this laptop. It's
admittedly a 4 year old computer, but the SSD is fast and the laptop handles
most of my workflow otherwise completely fine.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-11-28 Thread Øystein Steffensen-Alværvik
https://bugs.kde.org/show_bug.cgi?id=400704

Øystein Steffensen-Alværvik  changed:

   What|Removed |Added

Version|5.45.0  |5.64.0

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-11-28 Thread Øystein Steffensen-Alværvik
https://bugs.kde.org/show_bug.cgi?id=400704

Øystein Steffensen-Alværvik  changed:

   What|Removed |Added

 CC||yst...@posteo.net

--- Comment #27 from Øystein Steffensen-Alværvik  ---
Confirmed on openSUSE Tumbleweed with Frameworks 5.64. Everything freezes for
about 30 seconds, works for 30 seconds, then freezes again. The only solution
is to turn Baloo completely off. This is also a considerable problem when only
files, not their contents, are being indexed. 

Operating System: openSUSE Tumbleweed 20191124
KDE Plasma Version: 5.17.3
KDE Frameworks Version: 5.64.0
Qt Version: 5.13.1
Kernel Version: 5.3.12-1-default
OS Type: 64-bit
Processors: 4 × Intel® Core™ i5-4210U CPU @ 1.70GHz
Memory: 11,6 GiB

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-10-12 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=400704

Martin Steigerwald  changed:

   What|Removed |Added

 CC||mar...@lichtvoll.de

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-10-08 Thread Kai Krakow
https://bugs.kde.org/show_bug.cgi?id=400704

Kai Krakow  changed:

   What|Removed |Added

   See Also||https://bugs.kde.org/show_b
   ||ug.cgi?id=404057

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-08-18 Thread Karl Ove Hufthammer
https://bugs.kde.org/show_bug.cgi?id=400704

Karl Ove Hufthammer  changed:

   What|Removed |Added

 CC||k...@huftis.org

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-06-11 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham  changed:

   What|Removed |Added

   Priority|NOR |VHI

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-04-26 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #26 from rich...@meinsen.net ---
Hi
the comming change from actual to file size is good, also the change from
expected to used.
the link concerning 120% is not really clear to me. some sum was changed. but
the ouput still remains unexplained, 120% of what.


i understand that you identified the database layout as *the* problem. 
from my point of view i'd see the cpu (-> io) greed as an problem. in another
thread there was the statment that it works better/less blocking the system
with other schedulers. it really would be ok, at least for me, if the indexing
is done silently in background and not as fast as posslble, blocking the
system. looks independent from db schema changes to me.

at the moment I work with ml datasets -> archive files, but below GB size. it
looks as if the baloo_file_extr ist the process to be blamed. currently:  
PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND  
  user  39  19  259,7g  13,6g  10,3g R  97,0  70,5 355:49.30
baloo_file_extr
I let it run the whole night to get through his work - not ready yet. still
freezing the system (even mouse) quite often. 
as written i'd be ok to not index these files as fast as possible. and i can't
really understand how it can take 7 hours of fast cpu time to index less than
sub gb archives for the baloo_file_extr. 
looks independent from db schema changes to me.

you didn't answer the point that one can see what baloo is currently doing.
that could help a) for debugging and b) adjusting/excluding directories/files
from indexing. 
with options to tune indexing like
- don't index when (allow combinations)
-- filetype
-- if size is smaller/bigger
-- is is more/less than timespan at specific disk location
-- is is more/less than timespan created / modified
and a monitor command that allows to see the freeze causes in realtime
and a log that allows to see the freeze causes (files) later (log start
indexing / stop index of file) (when indexing it takes more than ___ minutes
between start and stop, when indexing took more than ___ minutes since start
it,   . this looks independent from db schema changes to me and being able
to tune baloo just to not do some things it has problems with would help to
optimize the usability until the big rewrite is done. 
it's a question how priorities are set.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-04-23 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #25 from Stefan Brüns  ---
==

Dear Users,

the issue described in this bug report is well understood. Solving the problem
requires significant changes to the database scheme. Before doing this changes
we have to be sure not to regress other use cases.

Screening bugs takes time, time better spent working on this problem and
solving other issues at hand.

Please refrain from adding additional comments here!

Kind regards, Stefan

==

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-04-23 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #24 from Stefan Brüns  ---
(In reply to richard from comment #23)
> 
> Why is the expected size only 2/3 of actual?
> And why don't the DB sizes sum up to the actual size?
> And what does 120% really mean?

Re actual size:
https://cgit.kde.org/baloo.git/commit/?id=f8c51b23796523f9b2d9d1582c7fb874181fbf2f

Re 120%:
https://cgit.kde.org/baloo.git/commit/?id=7be886c93d13191c6ebdf72669f657cbbf45c2c7

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-04-23 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=400704

rich...@meinsen.net changed:

   What|Removed |Added

 CC||rich...@meinsen.net

--- Comment #23 from rich...@meinsen.net ---
currently after system start and sometimes during work baloo grabs one cpu for
100% for quite a while, eats up to 13GB ram and makes the system quite
unresponsive (i7 w 4c+ht, 20gb ram, only ssd) when running.
this looks more like a complete reindexing of everything on the system and not
related to the amount of changed files. and i don't see how I could find what
exactly baloo is working on through means of balooctl. 
i use fedora 29 with standard schedulers. baloo taking 100% of cpu seems not
reasonable to me. also taking up to 13GB ram seems not reasonable to me.

during the last long run baloo status stated 
130470/133866 files index and current index size 15,61 GiB. there was no change
of more than 3000 large files since the last baloo 100% cpu run. indexing of a
new 185m git clone should have be done in very few minutes max.

Further the numbers in indexSize look strange to me

Actual Size: 15,61 GiB
Expected Size: 9,16 GiB

   PostingDB:   1,40 GiB   120.956 %
  PositionDB: 133,54 MiB11.266 %
DocTerms: 877,32 MiB74.014 %
DocFilenameTerms:  13,61 MiB 1.148 %
   DocXattrTerms:0 B 0.000 %
  IdTree:   2,52 MiB 0.213 %
  IdFileName:  10,07 MiB 0.850 %
 DocTime:   5,64 MiB 0.476 %
 DocData:   6,92 MiB 0.584 %
   ContentIndexingDB:0 B 0.000 %
 FailedIdsDB:0 B 0.000 %
 MTimeDB:   2,18 MiB 0.184 %

Why is the expected size only 2/3 of actual?
And why don't the DB sizes sum up to the actual size?
And what does 120% really mean?

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-04-16 Thread Jacques
https://bugs.kde.org/show_bug.cgi?id=400704

Jacques  changed:

   What|Removed |Added

 CC||jacq...@stry.co.za

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-01-22 Thread Patrick Silva
https://bugs.kde.org/show_bug.cgi?id=400704

Patrick Silva  changed:

   What|Removed |Added

 CC||bugsefor...@gmx.com

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2019-01-07 Thread soredake
https://bugs.kde.org/show_bug.cgi?id=400704

soredake  changed:

   What|Removed |Added

 CC||fds...@krutt.org

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-12-19 Thread Yuri Chornoivan
https://bugs.kde.org/show_bug.cgi?id=400704

Yuri Chornoivan  changed:

   What|Removed |Added

 CC||yurc...@ukr.net

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-12-06 Thread Kevin Colyer
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #22 from Kevin Colyer  ---
(In reply to Stefan Brüns from comment #21)
> It would save a lot of developer time if not everyone would add their "me
> too" comments.
> 
> Changes to the database are planned, but this is not trivial. One structure
> may work well for a number of cases and cause huge problems for others.
> These changes have to be evaluated, for performance and for correctness.
> 
> The baloo codebase has been enhanced with additional unit tests recently,
> increasing code coverage and reducing the chance for regressions. This is an
> ongoing effort likely taking several more months until completeted.
> 
> Baloo is currently developed mostly by volunteers doing it in their spare
> time. Development will not go faster by adding some more exclamations marks
> ...

Dear Developers,

I am supremely grateful for all the work and efforts that have gone into the
indexing services for KDE. If I had the skills I would join you. I just glanced
at the Git repo and realised how unskilled I am to contribute; I couldn't even
find the schema. Baloo has improved greatly. 

However, I do wish to say please don't discourage well intentioned feedback.
Without feedback from users about their actual problems encountered future
priorities may not be as readily identified. As a long term KDE user,
enthusiast and advocate feedback is one of my most important contributions.
This thread follows from https://bugs.kde.org/show_bug.cgi?id=333655#c73 which
was started in 2014. I am only making my first comment now. The performance
issues have been a problem to me for all this time and I went for a long season
with baloo permanently off!

Do let me know if there is anything concrete I can contribute more than what I
offer in these comments.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-12-06 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #21 from Stefan Brüns  ---
It would save a lot of developer time if not everyone would add their "me too"
comments.

Changes to the database are planned, but this is not trivial. One structure may
work well for a number of cases and cause huge problems for others. These
changes have to be evaluated, for performance and for correctness.

The baloo codebase has been enhanced with additional unit tests recently,
increasing code coverage and reducing the chance for regressions. This is an
ongoing effort likely taking several more months until completeted.

Baloo is currently developed mostly by volunteers doing it in their spare time.
Development will not go faster by adding some more exclamations marks ...

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-12-06 Thread Kevin Colyer
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #20 from Kevin Colyer  ---
(In reply to Stefan Brüns from comment #19)
> An exponential backoff would only help if baloo would index the same files
> recurrently.
> 
> If you add new documents to your indexed folders, baloo will process these.
> It will not get better when you commit changesets double the size, the
> stalls will be even longer.
> 
> This is *not* a trivial problem which can be solved by adjusting a single
> knob.
> 
> Baloos datastructures currently impose a changeset size which is
> approximately proportional to the size of the database. Adding/changing a
> single small document can cause a DB update of several 100 MBytes.

Thanks for the prompt feedback. Currently I have to do a manual exponential
backoff of switching off baloo and turning it on overnight to do it's
indexing!!!

Given that a "single small document can cause a DB update of several 100
MBytes." might there need to a fresh look given to the underlying data
structure? That seems sub-optimal to me as a user who is struggling with the
indexing processes unintended side-effects.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-12-06 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #19 from Stefan Brüns  ---
An exponential backoff would only help if baloo would index the same files
recurrently.

If you add new documents to your indexed folders, baloo will process these. It
will not get better when you commit changesets double the size, the stalls will
be even longer.

This is *not* a trivial problem which can be solved by adjusting a single knob.

Baloos datastructures currently impose a changeset size which is approximately
proportional to the size of the database. Adding/changing a single small
document can cause a DB update of several 100 MBytes.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-12-06 Thread Kevin Colyer
https://bugs.kde.org/show_bug.cgi?id=400704

Kevin Colyer  changed:

   What|Removed |Added

 CC||ke...@thecolyers.net

--- Comment #18 from Kevin Colyer  ---
(In reply to Nate Graham from comment #14)
> There's a proposed patch in Bug 356357 that sparked a serious discussion
> about the frequency with which the DB should be written to, but
> unfortunately it went nowhere.

I am still suffering this problem. Yesterday nextcloud decided to refresh my
files and downloaded about 10G of files. Baloo started indexing and my desktop
stalls. Chrome can't start and and can do no work

I do hope we can get a solution soon - this is a long standing problem. Finding
things with an baloo saves me time... but not as much as I am loosing whilst
waiting for the indexer!

Please can we have a solution - 

I like the idea of throttling database updates - perhaps some sort of
exponential stand-off approach but inverted so high number of files index per
minute changes updates to 80, 160, 320 ... limit ?

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-26 Thread Alberto Salvia Novella
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #17 from Alberto Salvia Novella  ---
Since I'm not using Plasma right now I'm unsubscribing from this bug, but feel
free to re-subscribe me if you needed any help from me.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-26 Thread Alberto Salvia Novella
https://bugs.kde.org/show_bug.cgi?id=400704

Alberto Salvia Novella  changed:

   What|Removed |Added

 CC|es204904...@gmail.com   |

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-26 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham  changed:

   What|Removed |Added

 CC||martin.tlus...@gmail.com

--- Comment #16 from Nate Graham  ---
*** Bug 393465 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-26 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham  changed:

   What|Removed |Added

 CC||alexan...@zhigalin.tk

--- Comment #15 from Nate Graham  ---
*** Bug 359119 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-26 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|REPORTED|CONFIRMED

--- Comment #14 from Nate Graham  ---
There's a proposed patch in Bug 356357 that sparked a serious discussion about
the frequency with which the DB should be written to, but unfortunately it went
nowhere.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-26 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham  changed:

   What|Removed |Added

   See Also||https://bugs.kde.org/show_b
   ||ug.cgi?id=356357

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-26 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham  changed:

   What|Removed |Added

 CC||thomas.mesch...@ilr.tu-berl
   ||in.de

--- Comment #13 from Nate Graham  ---
*** Bug 376446 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-26 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham  changed:

   What|Removed |Added

 CC||t.ki...@gmail.com

--- Comment #12 from Nate Graham  ---
*** Bug 379011 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-26 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham  changed:

   What|Removed |Added

 CC||es204904...@gmail.com

--- Comment #11 from Nate Graham  ---
*** Bug 384234 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-26 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham  changed:

   What|Removed |Added

 CC||edoantoni...@hotmail.com

--- Comment #10 from Nate Graham  ---
*** Bug 401279 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-26 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham  changed:

   What|Removed |Added

 CC||yanp.b...@gmail.com

--- Comment #9 from Nate Graham  ---
*** Bug 400932 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-17 Thread Jack
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #8 from Jack  ---
After several reboots, I finally had systemsettings5 show me file search, and
turning that off, and another reboot, seems to have stopped the indexer from
running.

The odd thing was that despite earlier doing balooctl suspend, balooctl stop,
and balooctl disable, and balooctl showing disabled, it was still running.  Not
really sure what finally stopped it.  Hopefully it wont just start up again by
itself.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-17 Thread Jack
https://bugs.kde.org/show_bug.cgi?id=400704

Jack  changed:

   What|Removed |Added

 CC||ostroffjh@users.sourceforge
   ||.net

--- Comment #7 from Jack  ---
Same problem with baloo 5.52.0 (on Artix Linux).  GUI is almost completely
unresponsive.  Switching to text console and back updates the screen, but it
mostly stays frozen.  Sometimes clicking to switch between applications updates
things when I click, but otherwise frozen.

iotop shows baloo_file_extractor and one [kworker...] job at 99.99% (sometimes
alternating with a lower value still above 50%.)  Systemsettings/search does
not have any setting to turn indexing off, although no plugin is checked. 
balooctl does seem to show everything disabled and stopped, so I have no idea
why .

For me, this seems to have started relatively recently, but it's on a laptop I
don't use constantly, so I'm really not sure what updated triggered it.  Is
there anything else I can check, or any other data I can provide.  It makes the
laptop essentially unusable. (I'm posting this from a different PC (Gentoo)
although baloo here is 5.50.0 - I'll try updating.)

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-12 Thread Mayeul Cantan
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #6 from Mayeul Cantan  ---
(In reply to Axel Braun from comment #5)
> (In reply to Stefan Brüns from comment #4)
> 
> > Even with low priority, the kernel eventually has to flush the write
> > buffers, causing the high I/O latency for other tasks.
> 
> Should the I/O traffic from higher prioritized tasks not processed before as
> well? I mean, if baloo does not get any CPU time, how can it create such a
> high traffic? Looking at iotop, it is mostly a factor 100 to 1000 higher
> than other tasks

>From this link, it seems to be the case (though a link to the kernel source
would have been nicer)
https://unix.stackexchange.com/questions/153505/how-disk-io-priority-is-related-with-process-priority

> io_priority = (cpu_nice + 20) / 5

In my case, though, it was always baloorunner showing at 99.99 % I/O in iotop.
baloo_file_extractor would also run sometimes, but with a lesser subjective
impact on performance.
Setting baloorunner to a lower priority using ionice seemed to improve things
quite a bit, although I would have to confirm it.

I get the point about needing to flush the cache at some point. Unfortunately,
I am at a loss as to why my mouse freezes because of it. I am on a 8 (16
SMT)-core CPU, and only a couple are used by the kernel. CPU <-> RAM bandwidth
should not be the limiting factor, and other threads should be able to go
trough when CPU <-> Sata Controller is being waited on. Maybe it has to do with
interrupts comming in from the SATA controller?

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-11 Thread Axel Braun
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #5 from Axel Braun  ---
(In reply to Stefan Brüns from comment #4)

> Even with low priority, the kernel eventually has to flush the write
> buffers, causing the high I/O latency for other tasks.

Should the I/O traffic from higher prioritized tasks not processed before as
well? I mean, if baloo does not get any CPU time, how can it create such a high
traffic? Looking at iotop, it is mostly a factor 100 to 1000 higher than other
tasks

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-10 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #4 from Stefan Brüns  ---
(In reply to Mayeul Cantan from comment #3)

> Could baloorunner be ran with the equivalent of ionice -c 3 by default? (and
> maybe nice as well). My CPU is quite beefy, but I suffer of I/O contention:

baloo_file/baloo_file_extractor, which are the indexing task (i.e. the one
causing write accesses) are already running with lowest priority. baloorunner
is not relevant here.

Even with low priority, the kernel eventually has to flush the write buffers,
causing the high I/O latency for other tasks.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-10 Thread Mayeul Cantan
https://bugs.kde.org/show_bug.cgi?id=400704

Mayeul Cantan  changed:

   What|Removed |Added

 CC||mayeul.can...@live.fr

--- Comment #3 from Mayeul Cantan  ---
I came in to report the same problem. The system frequently freezes, with the
mouse not moving for a couple seconds, or the screen not being refreshed.

Regardless of what is causing high IO usage within baloo and akonadi, I
consider them background tasks (most of the time), and I would like to see them
prioritized as such.

Could baloorunner be ran with the equivalent of ionice -c 3 by default? (and
maybe nice as well). My CPU is quite beefy, but I suffer of I/O contention:

Arch Linux
Ryzen 7 2700X
8 GiB DDR4 2666
4TiB HDD system drive (WDC WD40EZRZ)

I will probably upgrade to a SSD at some point, but this is no excuse for a
background task to consume all of the available disk IO bandwidth ;)

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-06 Thread Axel Braun
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #2 from Axel Braun  ---
Thanks for your explanation, Stefan. Although I dont know how I can influence
the behaviour. If I start the computer the next day I would not expect heavy
re-indexing.
Are - by default - the database stores for akonadi (~/.local/share/akonadi
)excluded from baloo indexing?

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-05 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #1 from Stefan Brüns  ---
Unfortunately even the two most fundamental databases in baloo, the Terms and
the FileNameTerms DBs, show O(M^2) behaviour on updates. Everytime e.g. a "pdf"
is changed, the associated value (i.e. the IDs of all matching documents) for
the "pdf" term is updated.

An update may happen in two cases:
1. an existing file is appended, tagged, renamed ...
2. an existing file is replaced by an updated one (i.e. application creates a
temporary file on saving and atomically replaces the old one).

For (1.), the update can be minimized, i.e. only updating the terms which have
actually changed. I have some experimental patches for this.

For (2.), the database scheme has to be changed significantly.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-05 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham  changed:

   What|Removed |Added

 CC||n...@kde.org,
   ||stefan.bruens@rwth-aachen.d
   ||e

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 400704] Baloo indexing I/O introduces serious noticable delays

2018-11-05 Thread Zane Tu
https://bugs.kde.org/show_bug.cgi?id=400704

Zane Tu  changed:

   What|Removed |Added

 CC||zan...@gmail.com

-- 
You are receiving this mail because:
You are watching all bug changes.