Re: [gentoo-dev] [RFC][NEW] Utility to find orphaned files
* Daniel Pielmeier bil...@gentoo.org schrieb: What about searching the complete file system but using an exclude file where you can put directories and files which should not be searched. It is tedious to tell every path on the command-line. Also for instance if you specify /lib it will also search under /lib/modules and I am sure you do not consider all contents there as unneeded. hmm, perhaps there's some way to assign these files to some package ? You also need to consider that your tool will return other false positives like byte compiled python modules and perl header files. In general everything an ebuild does in phases where it adds files to file-system but files are not stored to CONTENTS (pkg_{pre,post}inst). At this point the files are needed but not recognized by the package manager. If the ebuild does not take care of this files when removing (pkg_{pre,post}rm) the package they will remain on the file-system and are now unneeded. Assuming these files are not optional/temporary (aka: can be regenerated on the fly), I see a generic design problem here: everything belonging to some package (excluding content data and configs, of course) should be assigned to the package. The big Q: how can we achieve this ? cu -- - Enrico Weigelt== metux IT service - http://www.metux.de/ - Please visit the OpenSource QM Taskforce: http://wiki.metux.de/public/OpenSource_QM_Taskforce Patches / Fixes for a lot dozens of packages in dozens of versions: http://patches.metux.de/ -
Re: [gentoo-dev] [RFC][NEW] Utility to find orphaned files
Angelo Arrifano schrieb am 25.04.2010 13:18: Hello developers developers and developers, Ever wondered how much crap is left in your X-years old Gentoo box? I just developed a python utility to efficiently find orphaned files in the system. By orphaned files I mean the files that are present on system directories and don't belong to any installed package. The package builds a virtual filesystem (cache) on the RAM using python hash tables. Then it uses the cache to find the ownership of files inside user-specified dirs. Building the cache takes less than 10 seconds here in a system with 1366 installed packages. This is not intended to be a finished program yet, I'm looking forward for your constructive commentaries. What about searching the complete file system but using an exclude file where you can put directories and files which should not be searched. It is tedious to tell every path on the command-line. Also for instance if you specify /lib it will also search under /lib/modules and I am sure you do not consider all contents there as unneeded. You also need to consider that your tool will return other false positives like byte compiled python modules and perl header files. In general everything an ebuild does in phases where it adds files to file-system but files are not stored to CONTENTS (pkg_{pre,post}inst). At this point the files are needed but not recognized by the package manager. If the ebuild does not take care of this files when removing (pkg_{pre,post}rm) the package they will remain on the file-system and are now unneeded. I have written something in perl which I recently tried to implement in python (not the same functionality like the perl version yet). I am not a good perl or python programmer but it fits my needs especially the perl version as I know a bit more perl than python. I attach both versions and a sample exclude file. Maybe it will be of help. -- Daniel Pielmeier cruft.tar.bz2 Description: BZip2 compressed data signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] [RFC][NEW] Utility to find orphaned files
Hello, On Sun, 25 Apr 2010 13:18:25 +0200 Angelo Arrifano mik...@gentoo.org wrote: Hello developers developers and developers, Ever wondered how much crap is left in your X-years old Gentoo box? I just developed a python utility to efficiently find orphaned files in the system. By orphaned files I mean the files that are present on system directories and don't belong to any installed package. The package builds a virtual filesystem (cache) on the RAM using python hash tables. Then it uses the cache to find the ownership of files inside user-specified dirs. Building the cache takes less than 10 seconds here in a system with 1366 installed packages. This is not intended to be a finished program yet, I'm looking forward for your constructive commentaries. There is a tool that does that, qfile from app-portage/portage-utils. Check the -o, --orphans* List orphan files option. It's not as straight forward as it could be, as it checks only for files specified as arguments or read from file. But you can trivially use it like: # find /dir/you/want/to/check/for/orphans | qfile -o -f - Best, Yuri.
Re: [gentoo-dev] [RFC][NEW] Utility to find orphaned files
On 25-04-2010 17:34, Yuri Vasilevski wrote: Hello, On Sun, 25 Apr 2010 13:18:25 +0200 Angelo Arrifano mik...@gentoo.org wrote: Hello developers developers and developers, Ever wondered how much crap is left in your X-years old Gentoo box? I just developed a python utility to efficiently find orphaned files in the system. By orphaned files I mean the files that are present on system directories and don't belong to any installed package. The package builds a virtual filesystem (cache) on the RAM using python hash tables. Then it uses the cache to find the ownership of files inside user-specified dirs. Building the cache takes less than 10 seconds here in a system with 1366 installed packages. This is not intended to be a finished program yet, I'm looking forward for your constructive commentaries. There is a tool that does that, qfile from app-portage/portage-utils. Check the -o, --orphans* List orphan files option. It's not as straight forward as it could be, as it checks only for files specified as arguments or read from file. But you can trivially use it like: # find /dir/you/want/to/check/for/orphans | qfile -o -f - Best, Yuri. Based on the comments so far, I'll try to make my PoC a better tool. My primary objective is to make this some kind of disk cleanup utility for Gentoo boxens. I don't expect Gentoo systems to be *that* polluted but sometimes we all have to do ugly things to fix broken systems real fast. - If you know what I mean. There are other things that came to my mind, like using stored hashes to check the system files integrity (as in security). My next steps in regard to this utility will be: * Follow harring suggestion and use available PM API. * Make the application handle symlinks so we start getting a more informative output. * To store the generated cache on disk and to only regenerate it if needed. Regards, - Angelo
Re: [gentoo-dev] [RFC][NEW] Utility to find orphaned files
On Sun, Apr 25, 2010 at 1:18 PM, Angelo Arrifano mik...@gentoo.org wrote: Hello developers developers and developers, Ever wondered how much crap is left in your X-years old Gentoo box? I just developed a python utility to efficiently find orphaned files in the system. By orphaned files I mean the files that are present on system directories and don't belong to any installed package. The package builds a virtual filesystem (cache) on the RAM using python hash tables. Then it uses the cache to find the ownership of files inside user-specified dirs. Building the cache takes less than 10 seconds here in a system with 1366 installed packages. This is not intended to be a finished program yet, I'm looking forward for your constructive commentaries. i have refactored findcruft (search the forums) two years ago (see http://git.xnull.de/cgit/findcruft2/), maybe you can take a look at it, especially the false-positives handling. HTH, Bene