https://bugs.kde.org/show_bug.cgi?id=388761

            Bug ID: 388761
           Summary: Baloo search returns same deleted backup file multiple
                    times, can't clear it
           Product: frameworks-baloo
           Version: 5.41.0
          Platform: Fedora RPMs
                OS: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: NOR
         Component: balooctl
          Assignee: pinak.ah...@gmail.com
          Reporter: skierp...@gmail.com
  Target Milestone: ---

I notice when I enter certain filenames in the Plasma desktop's Application
Launcher "Click to search" field, I get a lot of duplicate results for vim
backup files ending in '~'.

I can repeat this with baloosearch, for example:

% baloosearch "History of T460"
/media/Windows/Users/spage/Documents/computer_crap/History_of_T460_packages.txt
/media/Windows/Users/spage/Documents/computer_crap/History_of_T460_packages.txt~
... repeated 32 more times!
/media/Windows/Users/spage/Documents/computer_crap/History_of_T460_packages.txt~
/media/Windows/Users/spage/Documents/computer_crap/2016_T460_laptop.txt

(I apologize for my directory's bad language :-) .)

1. This file on a Windows NTFS drive, but I usually edit it from Linux. It was
indexed because I added /media/Windows/Users/spage/Documents/ to
~/.cofing/baloofilerc.
2. This file does not exist any more (I must have disabled vim creating a '~'
backup).
3. Baloo these days excludes files ending in '~', my ~/.config/baloofilerc
contains exclude filters=... ,*~ ...
4. Maybe Baloo doesn't notice when a file is deleted, especially when it
excludes it, so I tried to manually remove it from the index with
   % balooctl clear
'/media/Windows/Users/spage/Documents/computer_crap/History_of_T460_packages.txt~'
which prints
   Could not stat file:
/media/Windows/Users/spage/Documents/computer_crap/History_of_T460_packages.txt~
   File(s) cleared

But there's no change to Baloo search behavior, it still returns the same
backup file 34 times in search results for terms in that file.

So I changed ~/.config/baloofilerc to allow indexing of files ending in ~,
killed baloo including the undocumented  /usr/libexec/baloorunner process,
restarted baloo, and retried
  % balooctl clear '/media/Windows/Users/spage/My
Documents/computer_crap/History_of_T460_packages.txt~'

but despite saying "File(s) cleared", ... it's still in search results 34
times.

So I recreated the file containing just some dummy terms "INDEX THIS FILE baloo
blorf". `baloosearch` does *not* find the new term "blorf" in this file, but
terms from the old file contents still match the file 34 times.

If I use the undocumented command `balooshow -x
/media/Windows/Users/spage/Documents/computer_crap/History_of_T460_packages.txt\~`
it says "No index information found" after I clear the file, but gives me
information about the file when I index it such as "File Name Terms: Fhistory
Fof Fpackages Ft460 Ftxt history of packages t460 txt." However, I notice
balooshow doesn't include Line Count or a list of indexed Terms.

So, after over an hour fiddling with this, there seem to be at least two bugs.
1. `balooctl clear foo.txt~`
does not in fact clear search term information for the file if Baloo no longer
considers this a text file it should index.

2. `balooctl index foo.txt~`
prints misleading "File(s) indexed" even when the file is excluded from
indexing, or is not considered a text file

I suspect the only way to get rid of these bogus multiple matches to one file
in search results is to yet again give up and delete my 32,801 file 1.96 GB
baloo index and rebuild it from scratch. I'm running `balooctl checkDb`, it has
spent 15 minutes at "DocumentTermsDB check .." with one CPU core pegged.

I realize file indexing is hard and I appreciate baloo and its predecessor
nepomuk when it works, but please improve baloo's software engineering.
* Document every utility.
* Make sure commands like "clear" and "index" accurately report what they're
doing. They need to print things like "File metadata indexed, but file contents
ignored due to <xyz>", "File excluded from indexing", "File exists but is not
present in index", etc.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to