Bug#500690: [recoll] Problems with missing document handling packages
2008/10/16 Jean-Francois Dockes [EMAIL PROTECTED] Thanks a lot for running these tests and sending the results. It's quite reassuring that initial indexing works as expected. About later indexing passes, I had another look at how Recoll *really* works (as opposed to how I thought it worked :) ) and in fact, for file types with missing helper applications, indexing is always retried (so that it succeeds as soon as the helper is installed). Trying to execute the filters wastes quite a lot of time. This explains why the times go down after the helper is installed: the files get indexed the first time, then nothing further happens if they stay unchanged. Recoll 1.11 has been modified to work slightly differently: executing a missing filter is only tried once per indexing pass. The program then remembers the failure and doesn't retry. The files still get indexed at the first indexing pass following helper installation, and there is almost no performance penalty for missing helpers, best of both worlds (hopefully). Thanks again for prompting me to implement this well-needed change. Regards, J.F. Dockes Thank you JF, This is a great example of how flexible and reactive Open Source software can be - thank you for sharing this excellent program with us all. Best wishes, Peter -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#500690: [recoll] Problems with missing document handling packages
Peter Salisbury writes: 2008/9/30 Jean-Francois Dockes [EMAIL PROTECTED]: If Peter can spare some time to do more testing, I'd be quite interested by the output of the following sequence: - Add loglevel = 4 to ~/.recoll/recoll.conf - Uninstall the 3 helper packages, then: time recollindex -z 2 /tmp/rcllog-znopack.txt time recollindex2 /tmo/rcllog-nopack.txt - Reinstall the 3 packages then: time recollindex -z 2 /tmp/rcllog-zpack.txt time recollindex2 /tmo/rcllog-pack.txt Sorry it's taken a while, but here is the output you requested: [skipped test results] Thanks a lot for running these tests and sending the results. It's quite reassuring that initial indexing works as expected. About later indexing passes, I had another look at how Recoll *really* works (as opposed to how I thought it worked :) ) and in fact, for file types with missing helper applications, indexing is always retried (so that it succeeds as soon as the helper is installed). Trying to execute the filters wastes quite a lot of time. This explains why the times go down after the helper is installed: the files get indexed the first time, then nothing further happens if they stay unchanged. Recoll 1.11 has been modified to work slightly differently: executing a missing filter is only tried once per indexing pass. The program then remembers the failure and doesn't retry. The files still get indexed at the first indexing pass following helper installation, and there is almost no performance penalty for missing helpers, best of both worlds (hopefully). Thanks again for prompting me to implement this well-needed change. Regards, J.F. Dockes $ time recollindex -z 2rcllog-znopack.txt real8m48.449s user3m49.958s sys 2m57.675s $ time recollindex 2rcollog-nopack.txt real0m45.619s user0m23.909s sys 0m13.069s re-install: zlib1g-dev (1:1.2.3.3.dfsg-12) libid3-3.8.3-dev (3.8.3-7.2) libimage-exiftool-perl (7.30-1) pstotext (1.9-4) $ time recollindex -z 2rcllog-zpack.txt real16m23.720s user9m59.989s sys 3m45.342s $ time recollindex 2rcllog-pack.txt real0m28.198s user0m16.405s sys 0m4.676s The initial indexing is quicker without the helpers as you'd expect, but the re-indexing is slower. I can't send you the logs I'm afraid as they would be around 100MB but I had a look in the re-indexing log when the helpers were absent and there are lots of lines like this: :4:../internfile/internfile.cpp:357:FileInterner::internfile. ipath [] :4:../utils/execmd.cpp:163:ExecCmd::doexec: ((nil)|0x9828eac) /usr/share/recoll/filters/rclimg {/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png} Can't locate Image/ExifTool.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at /usr/share/recoll/filters/rclimg line 61. BEGIN failed--compilation aborted at /usr/share/recoll/filters/rclimg line 61. :2:../internfile/mh_exec.cpp:71:MimeHandlerExec: command status 0x200: /usr/share/recoll/filters/rclimg :2:../internfile/internfile.cpp:412:FileInterner::internfile: next_document error [/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png] :2:../internfile/internfile.cpp:494:FileInterner::internfile: conversion ended with no doc :4:../rcldb/rcldb.cpp:1027:Db::add: docid 17360 updated [/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png , ] :4:../internfile/internfile.cpp:109:FileInterner::FileInterner: [/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png] mime [(null)] preview 0 :4:../internfile/internfile.cpp:170:FileInterner::FileInterner: image/png [/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png] :4:../internfile/internfile.cpp:357:FileInterner::internfile. ipath [] :4:../utils/execmd.cpp:163:ExecCmd::doexec: ((nil)|0x97116b4) /usr/share/recoll/filters/rclimg {/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png} Can't locate Image/ExifTool.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at /usr/share/recoll/filters/rclimg line 61. BEGIN failed--compilation aborted at /usr/share/recoll/filters/rclimg line 61. :2:../internfile/mh_exec.cpp:71:MimeHandlerExec: command status 0x200: /usr/share/recoll/filters/rclimg :2:../internfile/internfile.cpp:412:FileInterner::internfile: next_document error [/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png] :2:../internfile/internfile.cpp:494:FileInterner::internfile: conversion ended with no doc HTH, Peter -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#500690: [recoll] Problems with missing document handling packages
2008/9/30 Jean-Francois Dockes [EMAIL PROTECTED]: Hello, Kartik Mistry writes: On Tue, Sep 30, 2008 at 6:45 PM, Peter Salisbury [EMAIL PROTECTED] wrote: I installed recoll on a fairly sparse system and it took ages to index every time. It was only when I ran it from a terminal that I realised it was missing some required packages for indexing certain types of file. Ideally a better message would be given via the UI, and/or it would skip the types of file it can't index rather than take the time to fail at runtime. But perhaps at least these extra packages could be depend/recommend/suggested by recoll. The ones I had to install were: libimage-exiftool-perl libid3-3.8.3-dev pstotext I can think of no reason why Recoll indexing should be slower when the helper programs are not installed (so I'm quite probably missing something). I do agree that missing helpers should somehow be listed in the UI when indexing finishes, this has been on the todo list for ages, the difficulty is for the implementation not to get ennoying if the user doesn't want to install them. They are listed at the end of the error/debug log, but nobody looks at this of course. Normally, file types which can't be indexed by content (no helper package) are indexed by file name the first time, and then skipped if they don't change. After installing the helper, you need a full reindex (recollindex -z) to get them indexed. If Peter can spare some time to do more testing, I'd be quite interested by the output of the following sequence: - Add loglevel = 4 to ~/.recoll/recoll.conf - Uninstall the 3 helper packages, then: time recollindex -z 2 /tmp/rcllog-znopack.txt time recollindex2 /tmo/rcllog-nopack.txt - Reinstall the 3 packages then: time recollindex -z 2 /tmp/rcllog-zpack.txt time recollindex2 /tmo/rcllog-pack.txt The log files should at least contain file names, but they might also contain data in some error cases. If no confidentiality issues prevent it, and in case the timings of the first phase are indeed longer, I'd be quite interested to have a look at them. Really excellent program which found my file in the 'safe place' where I'd lost it! Great, I'm glad that this thing can be of some use from time to time ! Cheers, J.F. Dockes Sorry it's taken a while, but here is the output you requested: $ time recollindex -z 2rcllog-znopack.txt real8m48.449s user3m49.958s sys 2m57.675s $ time recollindex 2rcollog-nopack.txt real0m45.619s user0m23.909s sys 0m13.069s re-install: zlib1g-dev (1:1.2.3.3.dfsg-12) libid3-3.8.3-dev (3.8.3-7.2) libimage-exiftool-perl (7.30-1) pstotext (1.9-4) $ time recollindex -z 2rcllog-zpack.txt real16m23.720s user9m59.989s sys 3m45.342s $ time recollindex 2rcllog-pack.txt real0m28.198s user0m16.405s sys 0m4.676s The initial indexing is quicker without the helpers as you'd expect, but the re-indexing is slower. I can't send you the logs I'm afraid as they would be around 100MB but I had a look in the re-indexing log when the helpers were absent and there are lots of lines like this: :4:../internfile/internfile.cpp:357:FileInterner::internfile. ipath [] :4:../utils/execmd.cpp:163:ExecCmd::doexec: ((nil)|0x9828eac) /usr/share/recoll/filters/rclimg {/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png} Can't locate Image/ExifTool.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at /usr/share/recoll/filters/rclimg line 61. BEGIN failed--compilation aborted at /usr/share/recoll/filters/rclimg line 61. :2:../internfile/mh_exec.cpp:71:MimeHandlerExec: command status 0x200: /usr/share/recoll/filters/rclimg :2:../internfile/internfile.cpp:412:FileInterner::internfile: next_document error [/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png] :2:../internfile/internfile.cpp:494:FileInterner::internfile: conversion ended with no doc :4:../rcldb/rcldb.cpp:1027:Db::add: docid 17360 updated [/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png , ] :4:../internfile/internfile.cpp:109:FileInterner::FileInterner: [/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png] mime [(null)] preview 0 :4:../internfile/internfile.cpp:170:FileInterner::FileInterner: image/png [/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png] :4:../internfile/internfile.cpp:357:FileInterner::internfile. ipath [] :4:../utils/execmd.cpp:163:ExecCmd::doexec: ((nil)|0x97116b4) /usr/share/recoll/filters/rclimg {/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png} Can't locate Image/ExifTool.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at /usr/share/recoll/filters/rclimg
Bug#500690: [recoll] Problems with missing document handling packages
Package: recoll Version: 1.10.6-1 I installed recoll on a fairly sparse system and it took ages to index every time. It was only when I ran it from a terminal that I realised it was missing some required packages for indexing certain types of file. Ideally a better message would be given via the UI, and/or it would skip the types of file it can't index rather than take the time to fail at runtime. But perhaps at least these extra packages could be depend/recommend/suggested by recoll. The ones I had to install were: libimage-exiftool-perl libid3-3.8.3-dev pstotext Really excellent program which found my file in the 'safe place' where I'd lost it! Thanks, Peter --- System information. --- Architecture: i386 Kernel: Linux 2.6.26-1-686 Debian Release: lenny/sid 500 unstableftp.uk.debian.org --- Package information. --- Depends (Version) | Installed =-+-== libc6 (= 2.7-1) | 2.7-13 libgcc1 (= 1:4.1.1) | 1:4.3.2-1 libice6 (= 1:1.0.0) | 2:1.0.4-1 libqt3-mt (= 3:3.3.8b) | 3:3.3.8b-5 libsm6| 2:1.0.3-2 libstdc++6 (= 4.2.1) | 4.3.2-1 libx11-6 | 2:1.1.4-2 libxapian15 | 1.0.7-3 libxext6 | 2:1.0.4-1 zlib1g (= 1:1.1.4) | 1:1.2.3.3.dfsg-12 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#500690: [recoll] Problems with missing document handling packages
severity 500690 wishlist thanks On Tue, Sep 30, 2008 at 6:45 PM, Peter Salisbury [EMAIL PROTECTED] wrote: Package: recoll Version: 1.10.6-1 Thanks for reporting bug, Peter. I installed recoll on a fairly sparse system and it took ages to index every time. It was only when I ran it from a terminal that I realised it was missing some required packages for indexing certain types of file. Ideally a better message would be given via the UI, and/or it would skip the types of file it can't index rather than take the time to fail at runtime. But perhaps at least these extra packages could be depend/recommend/suggested by recoll. The ones I had to install were: libimage-exiftool-perl libid3-3.8.3-dev pstotext I will check and add them in suggest as we did with other programs. Really excellent program which found my file in the 'safe place' where I'd lost it! Thanks. You should thanks to Jean-Francois Dockes (CC'ed) for writing excellent program :) -- Cheers, Kartik Mistry | 0xD1028C8D | IRC: kart_ Homepage: people.debian.org/~kartik Blog.en: ftbfs.wordpress.com Blog.gu: kartikm.wordpress.com -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#500690: [recoll] Problems with missing document handling packages
Hello, Kartik Mistry writes: On Tue, Sep 30, 2008 at 6:45 PM, Peter Salisbury [EMAIL PROTECTED] wrote: I installed recoll on a fairly sparse system and it took ages to index every time. It was only when I ran it from a terminal that I realised it was missing some required packages for indexing certain types of file. Ideally a better message would be given via the UI, and/or it would skip the types of file it can't index rather than take the time to fail at runtime. But perhaps at least these extra packages could be depend/recommend/suggested by recoll. The ones I had to install were: libimage-exiftool-perl libid3-3.8.3-dev pstotext I can think of no reason why Recoll indexing should be slower when the helper programs are not installed (so I'm quite probably missing something). I do agree that missing helpers should somehow be listed in the UI when indexing finishes, this has been on the todo list for ages, the difficulty is for the implementation not to get ennoying if the user doesn't want to install them. They are listed at the end of the error/debug log, but nobody looks at this of course. Normally, file types which can't be indexed by content (no helper package) are indexed by file name the first time, and then skipped if they don't change. After installing the helper, you need a full reindex (recollindex -z) to get them indexed. If Peter can spare some time to do more testing, I'd be quite interested by the output of the following sequence: - Add loglevel = 4 to ~/.recoll/recoll.conf - Uninstall the 3 helper packages, then: time recollindex -z 2 /tmp/rcllog-znopack.txt time recollindex2 /tmo/rcllog-nopack.txt - Reinstall the 3 packages then: time recollindex -z 2 /tmp/rcllog-zpack.txt time recollindex2 /tmo/rcllog-pack.txt The log files should at least contain file names, but they might also contain data in some error cases. If no confidentiality issues prevent it, and in case the timings of the first phase are indeed longer, I'd be quite interested to have a look at them. Really excellent program which found my file in the 'safe place' where I'd lost it! Great, I'm glad that this thing can be of some use from time to time ! Cheers, J.F. Dockes -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]