[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2022-03-06 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #24 from Martin Steigerwald  ---
(In reply to tagwerk19 from comment #23)
> (In reply to Martin Steigerwald from comment #22)
> > ...  this is not only related to BTRFS ...
> That's understood.
[…]
> It is a "Plan B" though in the absence of a determined developer who's
> willing to take up the Baloo reengineering work and the adoption of BTRFS in
> distros.

Well, I am not sure whether any of what they discuss about in this thread has
been
merged yet. It is has, I should have it already or soon, as I am currently
using
5.17-rc6 kernel.

So far I think I still have this indexing the same files twice and thrice and
so on issue,
but I can keep an eye on it.

I replied to this large thread and Neil replied to me then:

"> Bug 438434 - Baloo appears to be indexing twice the number of files than 
> are actually in my home directory
> 
> https://bugs.kde.org/438434

This bug wouldn't be address by using the filehandle.  Using a
filehandle allows you to compare two files within a single filesystem.
This bug is about comparing two filesystems either side of a reboot, to
see if they are the same.

As has already been mentioned in that bug, statfs().f_fsid is the best
solution (unless comparing the mount point is satisfactory)."

https://lore.kernel.org/linux-btrfs/cajfpegub4obzcbxfqqc8j-zuisw+kayzljzaevm_cgznvpx...@mail.gmail.com/T/#meaf736156e0937728e63c6fdc69376a5f4b02af2

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2022-03-05 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #23 from tagwer...@innerjoin.org ---
(In reply to Martin Steigerwald from comment #22)
> ...  this is not only related to BTRFS ...
That's understood.

Any fix to (specifically) BTRFS mounts would be like applying a sticking
plaster; better than trying to mitigate by mounting devices in a specific
order; maybe not as good as being able to specify a "would rather like" Minor
Device number in a mount command.

It is a "Plan B" though in the absence of a determined developer who's willing
to take up the Baloo reengineering work and the adoption of BTRFS in distros.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2022-03-05 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #22 from Martin Steigerwald  ---
(In reply to tagwerk19 from comment #20)
> (In reply to Martin Steigerwald from comment #19)
> > There is a huge discussion following this. I do not have the time to review
> > it right now, however there might be something in it in order to make Baloo
> > work for these use cases.
> Many thanks for keeping watch on the topic and there is indeed a lot to read
> through.
> 
> Do you think this:
> 
> https://lore.kernel.org/linux-btrfs/162742539595.32498.13687924366155737575.
> stgit@noble.brown/
> 
> could imply that the major:minor device numbers, as seen by stat (and
> baloo), start relating to the subvol? cf:

Tagwerk, this is not only related to BTRFS. As established before, device
major:minor numbers by the kernel are not guaranteed to be stable across
reboots.

Using is as a static identifier inside Baloo thus, in my humble opinion, is a
design mistake.

About the alternatives, there are quite some, I am not completely decided on
which one would be best.

But unless there is an willingness to actually consider replacing using
minor:major number with something else, there is no point to discuss this
further I'd say.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2022-03-05 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=438434

Nate Graham  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|CONFIRMED   |RESOLVED

--- Comment #21 from Nate Graham  ---


*** This bug has been marked as a duplicate of bug 401863 ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-08-02 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #20 from tagwer...@innerjoin.org ---
(In reply to Martin Steigerwald from comment #19)
> There is a huge discussion following this. I do not have the time to review
> it right now, however there might be something in it in order to make Baloo
> work for these use cases.
Many thanks for keeping watch on the topic and there is indeed a lot to read
through.

Do you think this:

https://lore.kernel.org/linux-btrfs/162742539595.32498.13687924366155737575.stgit@noble.brown/

could imply that the major:minor device numbers, as seen by stat (and baloo),
start relating to the subvol? cf:

There are long-standing problems with btrfs subvols, particularly in
relation to whether and how they are exposed in the mount table.

 - /proc/self/mountinfo reports the major:minor device number for each
filesystem and when a btrfs subvol is explicitly mounted, the number
reported is wrong - it does not match what stat() reports for the
mountpoint.

But there does seem to be a wide range of options put forward and it's not
really clear what the front runner is.

For me, name_to_handle_at() returns a 20 byte handle. Having such an invariant
is good, but it is big...

Thanks again...

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-08-02 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #19 from Martin Steigerwald  ---
I switched Baloo to just indexing filenames not contents cause it was so
unbearable for me.

There is a new discussion on how to deal with BTRFS/nfsd subvol dev/inode
number issues and how to allow user space to compare two items for real.

Starting here:

A Third perspective on BTRFS nfsd subvol dev/inode number issues.

https://lore.kernel.org/linux-btrfs/cajfpegub4obzcbxfqqc8j-zuisw+kayzljzaevm_cgznvpx...@mail.gmail.com/T/#m45d0820a1e660ce28c79992a829588de67fd38c3

One interim suggestion is for BTRFS to use hashed inode numbers that are unique
in most cases. However ultimately Neil Brown suggests to tell user space
developers to use a new way to compare whether items are the same:

"The "obvious" choice for a replacement is the file handle provided by
name_to_handle_at() (falling back to st_ino if name_to_handle_at isn't
supported by the filesystem).  This returns an extensible opaque
byte-array.  It is *already* more reliable than st_ino.  Comparing
st_ino is only a reliable way to check if two files are the same if you
have both of them open.  If you don't, then one of the files might have
been deleted and the inode number reused for the other.  A filehandle
contains a generation number which protects against this.

So I think we need to strongly encourage user-space to start using
name_to_handle_at() whenever there is a need to test if two things are
the same."

There is a huge discussion following this. I do not have the time to review it
right now, however there might be something in it in order to make Baloo work
for these use cases.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-07-06 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=438434

tagwer...@innerjoin.org changed:

   What|Removed |Added

 Status|REPORTED|CONFIRMED
 Ever confirmed|0   |1

--- Comment #18 from tagwer...@innerjoin.org ---
Was able to replicate, flagging as "Confirmed"

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-26 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #17 from tagwer...@innerjoin.org ---
(In reply to Martin Steigerwald from comment #15)
> What for does Baloo need an invariant for the file?
As I understand it... internally, it is the key within the index. It also
allows "missed changes" to be reconciled if baloo is not running when the file
is changed or has missed the inotify.

> Why wouldn't a rename mess things up without an invariant (device number
> or filesystem id)? Or otherwise put how would having device/filesystem
> unique invariant help with a rename?
I think "the trap" is to avoid reindexing everything in a large folder tree if
you rename the top foldername. You need a way to tell if oldtree/x/y/z is the
same file as newname/x/y/z or not...

>From my experience, baloo has to react to inotify events and also be able to
smoothly recover/catch up if the events are missed.

> ... I bet you'd need a file/directory based
> invariant for that. I.e. a hash value for each file ...
Baloo also allows you to index the filename/metadata and not index the content.
A hash would be extra work here...

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-26 Thread Neal Gompa
https://bugs.kde.org/show_bug.cgi?id=438434

Neal Gompa  changed:

   What|Removed |Added

 CC||ngomp...@gmail.com

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-26 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #16 from tagwer...@innerjoin.org ---
(In reply to Martin Steigerwald from comment #14)
> ...  Possibly the best approach
> is to use the statfs() systemcall to get the "f_fsid" field.  This is
> 64bits.  It is not supported uniformly well by all filesystems, but I
> think it is at least not worse than using the device number ...
I see that "stat -f testfile.txt" gives a 64-bit ID.

I've been comparing that to the minor device number and BTRFS subvolid in
openSUSE. That ID appears stable (in my very constrained tests). I wasn't able
to dig up a lot about a "64 bit" fsid with the help of Google...

> ... And
> for filesystems not supporting it, it would at least not be worse than
> before ...
It's not clear how to find that out :-)

The kernel.org thread does look interesting through, let me see if I can follow
all the subtleties. I did try "requesting" mounts to be done in a particular
order (via x-systemd.requires). No joy...
https://bugs.kde.org/show_bug.cgi?id=402154#c24

> What do you think?
We're dependent on a willing developer. Alas, that's not my forte ...

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-26 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #15 from Martin Steigerwald  ---
One possible drawback for BTRFS could be: In case someone changes the subvolume
that is mounted for /home Baloo would re-index files. However… that still would
be preferable I think. Also I'd probably combine the statfs() fsid approach
which an approach to tell Baloo "/home" or another path is persistent. Actually
I think in 99+% of all cases it is.

According to what I gathered the device number could change in several cases:

- BTRFS and/or LVM are in use and the order of doing things might change.
- In a desktop machine with several controllers there would be a driver loading
race conditation
- Even between different mounts, especially with Systemd, they probably would
not be mounted in the same order.

BTRFS as well as LVM uses so called "anonymous" device numbers. From what I
understand these are dynamically allocated device numbers. These are only valid
during run-time.

So first step would be:

- What for does Baloo need an invariant for the file?
- Why wouldn't a rename mess things up without an invariant (device number or
filesystem id)? Or otherwise put how would having device/filesystem unique
invariant help with a rename? I bet you'd need a file/directory based invariant
for that. I.e. a hash value for each file.

I think also regarding the energy efficiency goal it would be good to revisit
all of this and come up with an approach that avoids clearly needless indexing
work. I bet that indexing files and mails is easily the most energy and
resource consuming aspect of Plasma desktop and KDE applications.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-26 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #14 from Martin Steigerwald  ---
Dear tagwerk19, I finally asked Linux kernel developers on fixed device
numbers:

Assumption on fixed device numbers in Plasma's desktop search Baloo

https://lore.kernel.org/linux-block/1769070.0rzTUBzp5V@ananda/T/#t

In there I got different opinions back. For one thing Qu Wenruo argued that
also find uses device numbers during runtime to see whether it crosses
filesystem boundaries. But on the other hand I do not think that "find" relies
on them to be stable across reboots.

Neil Brown clearly said that no userspace component can rely on device numbers
since kernel 2.4. Luckily he recommended an alternative:

"That is really hard to provide in general.  Possibly the best approach
is to use the statfs() systemcall to get the "f_fsid" field.  This is
64bits.  It is not supported uniformly well by all filesystems, but I
think it is at least not worse than using the device number.  For a lot
of older filesystems it is just an encoding of the device number.

For btrfs, xfs, ext4 it is much much better."

https://lore.kernel.org/linux-block/1769070.0rzTUBzp5V@ananda/T/#m28b8c889c9289ad1ec76cbf040938ea883e3f375

How about doing that? According to Qu Wenruo unlike filesystem UUID which is
the same for all subvolumes it would also work for BTRFS cause it XOR'd the
subvolume id into the filesystem id when using that system call.

While I still may find a work-around I think this approach could solve a lot of
the issues that arise from Baloo relying on stable device numbers. And for
filesystems not supporting it, it would at least not be worse than before. I
bet more KDE Plasma users are using BTRFS, XFS, Ext4 anyway.

Of course for BSD you would need to see for a different solution or use the
current approach, in case it does not have that functionality. I have no idea
what functionality BSD provides there. But for Linux I think this could be a
viable alternative.

What do you think?

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-26 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #13 from tagwer...@innerjoin.org ---
(In reply to Martin Steigerwald from comment #12)
> Now I got confirmation that the device number can be different.
If I follow the sequence...

You've had baloo initially indexing files with 21h minor device number. Then
your $HOME reappeared with a 20h minor device number, the number of files
indexed doubled and a new round of content indexing started (which may not have
yet finished)

Possibly you'd jumped back to 21h (cannot really say) but you are now back with
20h and the indexing is continuing...

I think I'd be more worried if a new and different device number appeared, then
you'd in for an impossible job :-/

> For NFS there is a fsid= mount option to specify the filesystem ID. Maybe
> that can help.
I've tried putting fsid's in the /etc/fstab for by BTRFS mounts but they seems
not to do anything. Worth a try :-)

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-25 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #12 from Martin Steigerwald  ---
Now I got confirmation that the device number can be different. This boot:

% LANG=en stat testfile.txt
  File: testfile.txt
  Size: 14  Blocks: 8  IO Block: 4096   regular file
Device: 20h/32d Inode: 2312091 Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/  martin)   Gid: ( 1000/  martin)
Access: 2021-06-24 16:08:54.489824537 +0200
Modify: 2021-06-11 16:27:10.953499381 +0200
Change: 2021-06-11 16:27:10.953499381 +0200
 Birth: 2021-06-11 16:27:10.953499381 +0200

And Baloo appears to index all the files another time again:

% LANG=en balooctl status
Baloo File Indexer is running
Indexer state: Suspended
Total files indexed: 1,191,425
Files waiting for content indexing: 278,307
Files failed to index: 0
Current size of index is 8.12 GiB

For NFS there is a fsid= mount option to specify the filesystem ID. Maybe that
can help.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-20 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #11 from tagwer...@innerjoin.org ---
(In reply to Martin Steigerwald from comment #10)
> ... I hope you understand that I am not willing to change my setup
> dm-crypt with LUKS, LVM and then BTRFS (single) on top of it, to be able to
> guarantee a stable device number ...
Absolutely...

However baloo depends on having some sort of "invariant" for a file. Depending
on a filename/path would also leave the system vulnerable to the random
renaming of large directory trees or remounting something under a different
mount point.

> ... I do think the device number is not
> supposed to be of relevance for any application and is not guaranteed to be
> stable in Linux, but I can certainly ask Linux kernel developers about their
> take on this ...
Search/indexing is somehow "in the middle" between being an application and
system software. It seems to need to know deeper stuff (maybe things like
Dropbox also need such knowledge)

Yes. If there's any magic way of asking for a vol or subvol mount to be "at" a
given device number, that would sidestep around the problem. A forlorn,
optimistic hope perhaps - but who knows?

> Does that mean that Baloo saw two different device numbers (32 and 33)?
> 
> There have been several reboots and it may be that at a certain point the
> device number has been different, I just checked for the device number as I
> actually noticed Baloo was re-indexing files. So maybe the re-indexing was
> triggered by a different device number from a boot in between?
I think so...

I've no practical experience of your stack but I've see far worse with
openSUSE's BTRFS setup 8-/

> But maybe I misread the above output. Anyway, I hope it helps.
I'll say thank you for persisting. If you find any workrounds, let us know...

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-20 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #10 from Martin Steigerwald  ---
% baloosearch --id testfile.txt | grep testfile.txt | head -2
Verstrichen: 14,6812 msec
23479b0020 /home/martin/testfile.txt
23479b0021 /home/martin/testfile.txt


% LANG=en balooshow -x 23479b0020
23479b0020 32 2312091 /home/martin/testfile.txt
Mtime: 1623421630 2021-06-11T16:27:10
Ctime: 1623421630 2021-06-11T16:27:10

Internal Info
Terms: Mplain Mtext T5 T8 
File Name Terms: Ftestfile Ftxt 
XAttr Terms: 


% LANG=en balooshow -x 23479b0021
23479b0021 33 2312091 /home/martin/testfile.txt
Mtime: 1623421630 2021-06-11T16:27:10
Ctime: 1623421630 2021-06-11T16:27:10
Cached properties:
Line Count: 1

Internal Info
Terms: Mplain Mtext T5 T8 X20-1 hello penguin 
File Name Terms: Ftestfile Ftxt 
XAttr Terms: 
lineCount: 1


Does that mean that Baloo saw two different device numbers (32 and 33)?

There have been several reboots and it may be that at a certain point the
device number has been different, I just checked for the device number as I
actually noticed Baloo was re-indexing files. So maybe the re-indexing was
triggered by a different device number from a boot in between?

I still think relying on the device number creates more problems than it
solves. I hope you understand that I am not willing to change my setup dm-crypt
with LUKS, LVM and then BTRFS (single) on top of it, to be able to guarantee a
stable device number. I do think the device number is not supposed to be of
relevance for any application and is not guaranteed to be stable in Linux, but
I can certainly ask Linux kernel developers about their take on this.

But maybe I misread the above output. Anyway, I hope it helps.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-20 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #9 from tagwer...@innerjoin.org ---
So, you've purged the database and "baloo_index" counted 551,532 files to index
- then come back a bit later and it says 1,086,500 (and is ploughing slowly
through them)

The "device number" stat shows for the test file hasn't changed, so no
immediate explanation for where baloo found the "doubled" files.

If you search for the testfile - or maybe a file that "balooctl monitor" shows
as just having been indexed - and search for it

baloosearch --id ...fileindexedmorethanonce...

Do you get two/several results?

The --id option seems to be new and you can see the inode/device number in the
id. Thanks are due to skierpage for pointing it out in Bug 438527 :-)

You can give the "id" to balooshow -x and get the indexed details, including
the device number/inode of the file as indexed. So something like:

balooshow -x 1000befc01

Maybe one small step further forward?

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-19 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #8 from Martin Steigerwald  ---
Tagwerk. I still have:

% LANG=en stat testfile.txt
  File: testfile.txt
  Size: 14  Blocks: 8  IO Block: 4096   regular file
Device: 21h/33d Inode: 2312091 Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/  martin)   Gid: ( 1000/  martin)
Access: 2021-06-17 19:32:43.970949695 +0200
Modify: 2021-06-11 16:27:10.953499381 +0200
Change: 2021-06-11 16:27:10.953499381 +0200
 Birth: 2021-06-11 16:27:10.953499381 +0200

Yet Baloo does this:
LANG=en balooctl status
Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 1,087,500
Files waiting for content indexing: 269,002
Files failed to index: 0
Current size of index is 7.06 GiB

It currently indices files in my home directory that it should have picked up
in the initial run and that did not change in between. It appears to me that it
is indexing all the files again.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-11 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #7 from tagwer...@innerjoin.org ---
(In reply to Martin Steigerwald from comment #6)
> ... Should Baloo try to index all
> files again I will have another look at the device number ...
I think it would be a reasonable explanation if you find yourself reindexing
everything. It is certainly an issue with openSUSE...

> ... Still I think it
> is broken to assume that the device number does not change ...
OK. Let's say that if baloo can be made proof against this then that's a good
thing :-)

> ... file that causes the file extractor to go bonkers
Maybe also in Bug 438074 "Baloo reindexing files on every start". That seems to
be focussing in on some specific files/filetypes.

> Der aktuelle Index hat eine Größe von 7,27 GiB
I've seen that the index size and memory use can balloon when deleting entries.
Bug 437754.

> @Stefan: I am grateful for all you did for Baloo. I know you improved it
> quite a bit. So thank you ...
I will say the same.

It was baloo and the tag handling in Dolphin that make me a KDE user.

However when I started using KDE "for real", if I renamed a folder tree in
Dolphin, I needed to log out and back in again to get back to a responsive
system. That was a just a couple of years ago.

You don't necessarily notice the steady development and step by step
improvements but I find it remarkable how good baloo is at what it does and how
much more solid it has become over the last years.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-11 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=438434

Stefan Brüns  changed:

   What|Removed |Added

   Assignee|stefan.bruens@rwth-aachen.d |baloo-bugs-n...@kde.org
   |e   |

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-11 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #6 from Martin Steigerwald  ---
@Stefan: I am grateful for all you did for Baloo. I know you improved it quite
a bit. So thank you. There is nothing personal in here. First off I do not even
know who implemented the dependency on the device number. But also in the case
you did: You are not your code and you are also not your decision to do so. As
you stepped down as a maintainer I am going to work on this with anybody who is
willing to consider my report.

@Tagwerk19. Thank you for your response. I am willing to provide the
information you requested. Here it is:

I created a file and told Baloo to index it with "balooctl index".

% LANG=en balooshow -x testfile.txt
23479b0021 33 2312091 testfile.txt [/home/martin/testfile.txt]
Mtime: 1623421630 2021-06-11T16:27:10
Ctime: 1623421630 2021-06-11T16:27:10
Cached properties:
Line Count: 1

Internal Info
Terms: Mplain Mtext T5 T8 X20-1 hello penguin 
File Name Terms: Ftestfile Ftxt 
XAttr Terms: 
lineCount: 1

% LANG=en stat testfile.txt
  File: testfile.txt
  Size: 14  Blocks: 8  IO Block: 4096   regular file
Device: 21h/33d Inode: 2312091 Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/  martin)   Gid: ( 1000/  martin)
Access: 2021-06-11 16:27:10.953499381 +0200
Modify: 2021-06-11 16:27:10.953499381 +0200
Change: 2021-06-11 16:27:10.953499381 +0200
 Birth: 2021-06-11 16:27:10.953499381 +0200



After reboot I get:

% LANG=en stat testfile.txt
  File: testfile.txt
  Size: 14  Blocks: 8  IO Block: 4096   regular file
Device: 21h/33d Inode: 2312091 Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/  martin)   Gid: ( 1000/  martin)
Access: 2021-06-11 16:29:54.518299009 +0200
Modify: 2021-06-11 16:27:10.953499381 +0200
Change: 2021-06-11 16:27:10.953499381 +0200
 Birth: 2021-06-11 16:27:10.953499381 +0200

% LANG=en balooshow -x testfile.txt
23479b0021 33 2312091 testfile.txt [/home/martin/testfile.txt]
Mtime: 1623421630 2021-06-11T16:27:10
Ctime: 1623421630 2021-06-11T16:27:10
Cached properties:
Line Count: 1

Internal Info
Terms: Mplain Mtext T5 T8 X20-1 hello penguin 
File Name Terms: Ftestfile Ftxt 
XAttr Terms: 
lineCount: 1


Of course that is no guarantee that the device number did not change as the
re-indexing of already indexed files happened.

I did another "balooctl purge" and it now is indexing a reasonable amount of
files:

% find . -type f -not -path '*/\.*' | wc -l
521488

% LANG=en balooctl status
Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 551,532
Files waiting for content indexing: 490,887
Files failed to index: 0
Current size of index is 367.83 MiB

And yes, I have been through quite some issues with Baloo (and Akonadi Search).
There is another one with a file that causes the file extractor to go bonkers.
I excluded the directory it is in.

I will keep that test file around for a while. Should Baloo try to index all
files again I will have another look at the device number. Still I think it is
broken to assume that the device number does not change. There are various
examples where it can change. One would be a desktop PC with two or more
controllers whose drivers compete for sda/sdb/sdc at every boot. I'd assume
that all files in the users home directory are always on the very same
filesystem by default.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-10 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #5 from tagwer...@innerjoin.org ---
In the case here, does the info given by "stat" change on a reboot? Is it an
instance of Bug 402154 or is it something new/something else?

I see you've been through all this before, cf Bug 404057, and can see that
there's something that needs to be solved.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-10 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #4 from Nate Graham  ---
Mr. tagwerk19, you seem to be knowledgeable about Baloo; would you be
interested in doing some development on it? We seem to be down one maintainer,
so the field is wide open. :)

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-10 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #3 from Stefan Brüns  ---
I no longer work on Baloo, rude behavior by various users had made me stop.

This rude behavior includes treating me like an idiot.

Stop assuming you can make any demands, without giving back.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-10 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=438434

--- Comment #2 from Martin Steigerwald  ---
I used a BTRFS RAID 1 before, but this time it is not.

"Baloo expects the device number / inode for files to be stable (not change
every reboot)"

If it changes though, for whatever reason, even though I use a single BTRFS
filesystem on the very same LUKS encrypted partition and on LVM, then the
requirement that the device number is stable, is broken.

Remember, we are talking about $HOME. I'd say that in 99% of all desktop use
cases, $HOME is not a wildly different filesystem on every reboot. So please,
pretty please *stop* relying on an internal operating system detail (device
number) to be stable for it.

It is all about usability here. Telling regular users to check whether their
device numbers are stable *just* to make indexing work reliable is not going to
fly regarding usability. I imagine asking my father checking for a device
number… seriously… please stop… relying on OS internals like an inode number or
even a device number to be stable.

This assumption is terminally broken, as has been shown here repeatedly.

Do you know any user of KDE Plasma who expects Baloo to reindex their unchanged
files in $HOME, just cause they may have a different $HOME on every reboot? If
Baloo relies on this, this is at the last a bad design choice. I'd go further
than that and I'd say its terminally broken regarding usability.

Imagine if I find out that the device number would change… what would I do?
Reinstall the system to match the assumptions of Baloo? Not going to happen.

Do whatever you need to do about removeable media, but just, assume, pretty
please assume, that $HOME will be the same directory tree on the same laptop
for years to come. And even if I copy it to another laptop… why would Baloo
even care? It is still the very same directory tree. Nothing, I repeat, nothing
of interest for Baloo has changed. Baloo has no business whatsoever to use the
device number for anything related to indexing.

Pretty please consider this input instead of dismissing it. The functionality
is broken cause it relies on a broken assumption. Please fix it.

Thank you dearly for your consideration.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 438434] Baloo appears to be indexing twice the number of files than are actually in my home directory

2021-06-10 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=438434

tagwer...@innerjoin.org changed:

   What|Removed |Added

 CC||tagwer...@innerjoin.org

--- Comment #1 from tagwer...@innerjoin.org ---
It might be worth looking at:
https://bugs.kde.org/show_bug.cgi?id=402154#c12

Baloo expects the device number / inode for files to be stable (not change
every reboot). With certain filessystems/distributions the device number can
change, with remote filesystems it seems that the inode can also change.

Try the test with "stat" and "balooshow -x" and see what you see.

The 402154 bug was related to openSUSE and multiple BTRFS subvolumes. It could
be that you are caught by the same issue.

-- 
You are receiving this mail because:
You are watching all bug changes.