Re: [gentoo-dev] [RFC][NEW] Utility to find orphaned files

2010-04-30 Thread Enrico Weigelt
* Daniel Pielmeier bil...@gentoo.org schrieb:

 What about searching the complete file system but using an exclude file where
 you can put directories and files which should not be searched. It is tedious 
 to
 tell every path on the command-line. Also for instance if you specify /lib it
 will also search under /lib/modules and I am sure you do not consider all
 contents there as unneeded.

hmm, perhaps there's some way to assign these files to some package ?
 
 You also need to consider that your tool will return other false positives 
 like
 byte compiled python modules and perl header files. In general everything an
 ebuild does in phases where it adds files to file-system but files are not
 stored to CONTENTS (pkg_{pre,post}inst). At this point the files are needed 
 but
 not recognized by the package manager. If the ebuild does not take care of 
 this
 files when removing (pkg_{pre,post}rm) the package they will remain on the
 file-system and are now unneeded.

Assuming these files are not optional/temporary (aka: can be regenerated on
the fly), I see a generic design problem here: everything belonging to some
package (excluding content data and configs, of course) should be assigned
to the package.

The big Q: how can we achieve this ?


cu
-- 
-
 Enrico Weigelt==   metux IT service - http://www.metux.de/
-
 Please visit the OpenSource QM Taskforce:
http://wiki.metux.de/public/OpenSource_QM_Taskforce
 Patches / Fixes for a lot dozens of packages in dozens of versions:
http://patches.metux.de/
-



Re: [gentoo-dev] [RFC][NEW] Utility to find orphaned files

2010-04-25 Thread Daniel Pielmeier
Angelo Arrifano schrieb am 25.04.2010 13:18:
 Hello developers developers and developers,
 
 Ever wondered how much crap is left in your X-years old Gentoo box?
 
 I just developed a python utility to efficiently find orphaned files in
 the system. By orphaned files I mean the files that are present on
 system directories and don't belong to any installed package.
 
 The package builds a virtual filesystem (cache) on the RAM using python
 hash tables. Then it uses the cache to find the ownership of files
 inside user-specified dirs.
 
 Building the cache takes less than 10 seconds here in a system with 1366
 installed packages.
 
 This is not intended to be a finished program yet, I'm looking forward
 for your constructive commentaries.

What about searching the complete file system but using an exclude file where
you can put directories and files which should not be searched. It is tedious to
tell every path on the command-line. Also for instance if you specify /lib it
will also search under /lib/modules and I am sure you do not consider all
contents there as unneeded.

You also need to consider that your tool will return other false positives like
byte compiled python modules and perl header files. In general everything an
ebuild does in phases where it adds files to file-system but files are not
stored to CONTENTS (pkg_{pre,post}inst). At this point the files are needed but
not recognized by the package manager. If the ebuild does not take care of this
files when removing (pkg_{pre,post}rm) the package they will remain on the
file-system and are now unneeded.

I have written something in perl which I recently tried to implement in python
(not the same functionality like the perl version yet). I am not a good perl or
python programmer but it fits my needs especially the perl version as I know a
bit more perl than python.

I attach both versions and a sample exclude file. Maybe it will be of help.

-- 
Daniel Pielmeier


cruft.tar.bz2
Description: BZip2 compressed data


signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] [RFC][NEW] Utility to find orphaned files

2010-04-25 Thread Yuri Vasilevski
Hello,

On Sun, 25 Apr 2010 13:18:25 +0200
Angelo Arrifano mik...@gentoo.org wrote:

 Hello developers developers and developers,
 
 Ever wondered how much crap is left in your X-years old Gentoo box?
 
 I just developed a python utility to efficiently find orphaned files
 in the system. By orphaned files I mean the files that are present on
 system directories and don't belong to any installed package.
 
 The package builds a virtual filesystem (cache) on the RAM using
 python hash tables. Then it uses the cache to find the ownership of
 files inside user-specified dirs.
 
 Building the cache takes less than 10 seconds here in a system with
 1366 installed packages.
 
 This is not intended to be a finished program yet, I'm looking forward
 for your constructive commentaries.

There is a tool that does that, qfile from app-portage/portage-utils.
Check the -o, --orphans* List orphan files option.

It's not as straight forward as it could be, as it checks only for
files specified as arguments or read from file.

But you can trivially use it like:
# find /dir/you/want/to/check/for/orphans | qfile -o -f -

Best,
Yuri.



Re: [gentoo-dev] [RFC][NEW] Utility to find orphaned files

2010-04-25 Thread Angelo Arrifano
On 25-04-2010 17:34, Yuri Vasilevski wrote:
 Hello,
 
 On Sun, 25 Apr 2010 13:18:25 +0200
 Angelo Arrifano mik...@gentoo.org wrote:
 
 Hello developers developers and developers,

 Ever wondered how much crap is left in your X-years old Gentoo box?

 I just developed a python utility to efficiently find orphaned files
 in the system. By orphaned files I mean the files that are present on
 system directories and don't belong to any installed package.

 The package builds a virtual filesystem (cache) on the RAM using
 python hash tables. Then it uses the cache to find the ownership of
 files inside user-specified dirs.

 Building the cache takes less than 10 seconds here in a system with
 1366 installed packages.

 This is not intended to be a finished program yet, I'm looking forward
 for your constructive commentaries.
 
 There is a tool that does that, qfile from app-portage/portage-utils.
 Check the -o, --orphans* List orphan files option.
 
 It's not as straight forward as it could be, as it checks only for
 files specified as arguments or read from file.
 
 But you can trivially use it like:
 # find /dir/you/want/to/check/for/orphans | qfile -o -f -
 
 Best,
 Yuri.
 

Based on the comments so far, I'll try to make my PoC a better tool.
My primary objective is to make this some kind of disk cleanup utility
for Gentoo boxens. I don't expect Gentoo systems to be *that* polluted
but sometimes we all have to do ugly things to fix broken systems real
fast. - If you know what I mean.

There are other things that came to my mind, like using stored hashes to
check the system files integrity (as in security).

My next steps in regard to this utility will be:
* Follow harring suggestion and use available PM API.
* Make the application handle symlinks so we start getting a more
informative output.
* To store the generated cache on disk and to only regenerate it if needed.

Regards,
- Angelo



Re: [gentoo-dev] [RFC][NEW] Utility to find orphaned files

2010-04-25 Thread Benedikt Böhm
On Sun, Apr 25, 2010 at 1:18 PM, Angelo Arrifano mik...@gentoo.org wrote:
 Hello developers developers and developers,

 Ever wondered how much crap is left in your X-years old Gentoo box?

 I just developed a python utility to efficiently find orphaned files in
 the system. By orphaned files I mean the files that are present on
 system directories and don't belong to any installed package.

 The package builds a virtual filesystem (cache) on the RAM using python
 hash tables. Then it uses the cache to find the ownership of files
 inside user-specified dirs.

 Building the cache takes less than 10 seconds here in a system with 1366
 installed packages.

 This is not intended to be a finished program yet, I'm looking forward
 for your constructive commentaries.

i have refactored findcruft (search the forums) two years ago (see
http://git.xnull.de/cgit/findcruft2/), maybe you can take a look at
it, especially the false-positives handling.

HTH,
Bene