Bug#500690: [recoll] Problems with missing document handling packages

2008-10-17 Thread Peter Salisbury
2008/10/16 Jean-Francois Dockes [EMAIL PROTECTED]

 Thanks a lot for running these tests and sending the results.

 It's quite reassuring that initial indexing works as expected.

 About later indexing passes, I had another look at how Recoll *really*
 works (as opposed to how I thought it worked :) ) and in fact, for file
 types with missing helper applications, indexing is always retried (so that
 it succeeds as soon as the helper is installed). Trying to execute the
 filters wastes quite a lot of time.

 This explains why the times go down after the helper is installed: the
 files get indexed the first time, then nothing further happens if they stay
 unchanged.

 Recoll 1.11 has been modified to work slightly differently: executing a
 missing filter is only tried once per indexing pass. The program then
 remembers the failure and doesn't retry.

 The files still get indexed at the first indexing pass following helper
 installation, and there is almost no performance penalty for missing
 helpers, best of both worlds (hopefully).

 Thanks again for prompting me to implement this well-needed change.

 Regards,
 J.F. Dockes

Thank you JF,

This is a great example of how flexible and reactive Open Source
software can be - thank you for sharing this excellent program with us
all.

Best wishes, Peter



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#500690: [recoll] Problems with missing document handling packages

2008-10-16 Thread Jean-Francois Dockes
Peter Salisbury writes:
  2008/9/30 Jean-Francois Dockes [EMAIL PROTECTED]:
   If Peter can spare some time to do more testing, I'd be quite
   interested by the output of the following sequence:
   -  Add loglevel = 4 to ~/.recoll/recoll.conf
   -  Uninstall the 3 helper packages, then:
   time recollindex -z 2 /tmp/rcllog-znopack.txt
   time recollindex2 /tmo/rcllog-nopack.txt
   - Reinstall the 3 packages then:
   time recollindex -z 2 /tmp/rcllog-zpack.txt
   time recollindex2 /tmo/rcllog-pack.txt
  
  
  Sorry it's taken a while, but here is the output you requested:
  [skipped test results]

Thanks a lot for running these tests and sending the results.  

It's quite reassuring that initial indexing works as expected.

About later indexing passes, I had another look at how Recoll *really*
works (as opposed to how I thought it worked :) ) and in fact, for file
types with missing helper applications, indexing is always retried (so that
it succeeds as soon as the helper is installed). Trying to execute the
filters wastes quite a lot of time.

This explains why the times go down after the helper is installed: the
files get indexed the first time, then nothing further happens if they stay
unchanged.

Recoll 1.11 has been modified to work slightly differently: executing a
missing filter is only tried once per indexing pass. The program then
remembers the failure and doesn't retry.

The files still get indexed at the first indexing pass following helper
installation, and there is almost no performance penalty for missing
helpers, best of both worlds (hopefully).

Thanks again for prompting me to implement this well-needed change.

Regards,
J.F. Dockes

  $ time recollindex -z 2rcllog-znopack.txt
  
  real8m48.449s
  user3m49.958s
  sys 2m57.675s
  
  $ time recollindex 2rcollog-nopack.txt
  
  real0m45.619s
  user0m23.909s
  sys 0m13.069s
  
  re-install:
  zlib1g-dev (1:1.2.3.3.dfsg-12)
  libid3-3.8.3-dev (3.8.3-7.2)
  libimage-exiftool-perl (7.30-1)
  pstotext (1.9-4)
  
  $ time recollindex -z 2rcllog-zpack.txt
  
  real16m23.720s
  user9m59.989s
  sys 3m45.342s
  
  $ time recollindex 2rcllog-pack.txt
  
  real0m28.198s
  user0m16.405s
  sys 0m4.676s
  
  The initial indexing is quicker without the helpers as you'd expect,
  but the re-indexing is slower.
  
  I can't send you the logs I'm afraid as they would be around 100MB but
  I had a look in the re-indexing log when the helpers were absent and
  there are lots of lines like this:
  
  :4:../internfile/internfile.cpp:357:FileInterner::internfile. ipath []
  :4:../utils/execmd.cpp:163:ExecCmd::doexec: ((nil)|0x9828eac)
  /usr/share/recoll/filters/rclimg
  {/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png}
  Can't locate Image/ExifTool.pm in @INC (@INC contains: /etc/perl
  /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5
  /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10
  /usr/local/lib/site_perl .) at /usr/share/recoll/filters/rclimg line
  61.
  BEGIN failed--compilation aborted at /usr/share/recoll/filters/rclimg line 
  61.
  :2:../internfile/mh_exec.cpp:71:MimeHandlerExec: command status 0x200:
  /usr/share/recoll/filters/rclimg
  :2:../internfile/internfile.cpp:412:FileInterner::internfile:
  next_document error
  [/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png]
  :2:../internfile/internfile.cpp:494:FileInterner::internfile:
  conversion ended with no doc
  :4:../rcldb/rcldb.cpp:1027:Db::add: docid 17360 updated
  [/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png , ]
  :4:../internfile/internfile.cpp:109:FileInterner::FileInterner:
  [/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png] mime
  [(null)] preview 0
  :4:../internfile/internfile.cpp:170:FileInterner::FileInterner:
  image/png [/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png]
  :4:../internfile/internfile.cpp:357:FileInterner::internfile. ipath []
  :4:../utils/execmd.cpp:163:ExecCmd::doexec: ((nil)|0x97116b4)
  /usr/share/recoll/filters/rclimg
  {/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png}
  Can't locate Image/ExifTool.pm in @INC (@INC contains: /etc/perl
  /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5
  /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10
  /usr/local/lib/site_perl .) at /usr/share/recoll/filters/rclimg line
  61.
  BEGIN failed--compilation aborted at /usr/share/recoll/filters/rclimg line 
  61.
  :2:../internfile/mh_exec.cpp:71:MimeHandlerExec: command status 0x200:
  /usr/share/recoll/filters/rclimg
  :2:../internfile/internfile.cpp:412:FileInterner::internfile:
  next_document error
  [/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png]
  :2:../internfile/internfile.cpp:494:FileInterner::internfile:
  conversion ended with no doc
  
  HTH, Peter



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#500690: [recoll] Problems with missing document handling packages

2008-10-15 Thread Peter Salisbury
2008/9/30 Jean-Francois Dockes [EMAIL PROTECTED]:
 Hello,

 Kartik Mistry writes:
   On Tue, Sep 30, 2008 at 6:45 PM, Peter Salisbury
   [EMAIL PROTECTED] wrote:
I installed recoll on a fairly sparse system and it took ages to index
every time. It was only when I ran it from a terminal that I realised
it was missing some required packages for indexing certain types of
file. Ideally a better message would be given via the UI, and/or it
would skip the types of file it can't index rather than take the time
to fail at runtime. But perhaps at least these extra packages could be
depend/recommend/suggested by recoll. The ones I had to install were:
   
libimage-exiftool-perl
libid3-3.8.3-dev
pstotext

 I can think of no reason why Recoll indexing should be slower when the
 helper programs are not installed (so I'm quite probably missing
 something).

 I do agree that missing helpers should somehow be listed in the UI when
 indexing finishes, this has been on the todo list for ages, the difficulty
 is for the implementation not to get ennoying if the user doesn't want to
 install them. They are listed at the end of the error/debug log, but nobody
 looks at this of course.

 Normally, file types which can't be indexed by content (no helper package)
 are indexed by file name the first time, and then skipped if they don't
 change. After installing the helper, you need a full reindex (recollindex
 -z) to get them indexed.

 If Peter can spare some time to do more testing, I'd be quite interested by
 the output of the following sequence:

 -  Add loglevel = 4 to ~/.recoll/recoll.conf
 -  Uninstall the 3 helper packages, then:

 time recollindex -z 2 /tmp/rcllog-znopack.txt
 time recollindex2 /tmo/rcllog-nopack.txt

 - Reinstall the 3 packages then:

 time recollindex -z 2 /tmp/rcllog-zpack.txt
 time recollindex2 /tmo/rcllog-pack.txt

 The log files should at least contain file names, but they might also
 contain data in some error cases. If no confidentiality issues prevent it,
 and in case the timings of the first phase are indeed longer, I'd be quite
 interested to have a look at them.

Really excellent program which found my file in the 'safe place' where
I'd lost it!

 Great, I'm glad that this thing can be of some use from time to time !

 Cheers,
 J.F. Dockes



Sorry it's taken a while, but here is the output you requested:

$ time recollindex -z 2rcllog-znopack.txt

real8m48.449s
user3m49.958s
sys 2m57.675s

$ time recollindex 2rcollog-nopack.txt

real0m45.619s
user0m23.909s
sys 0m13.069s

re-install:
zlib1g-dev (1:1.2.3.3.dfsg-12)
libid3-3.8.3-dev (3.8.3-7.2)
libimage-exiftool-perl (7.30-1)
pstotext (1.9-4)

$ time recollindex -z 2rcllog-zpack.txt

real16m23.720s
user9m59.989s
sys 3m45.342s

$ time recollindex 2rcllog-pack.txt

real0m28.198s
user0m16.405s
sys 0m4.676s

The initial indexing is quicker without the helpers as you'd expect,
but the re-indexing is slower.

I can't send you the logs I'm afraid as they would be around 100MB but
I had a look in the re-indexing log when the helpers were absent and
there are lots of lines like this:

:4:../internfile/internfile.cpp:357:FileInterner::internfile. ipath []
:4:../utils/execmd.cpp:163:ExecCmd::doexec: ((nil)|0x9828eac)
/usr/share/recoll/filters/rclimg
{/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png}
Can't locate Image/ExifTool.pm in @INC (@INC contains: /etc/perl
/usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5
/usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10
/usr/local/lib/site_perl .) at /usr/share/recoll/filters/rclimg line
61.
BEGIN failed--compilation aborted at /usr/share/recoll/filters/rclimg line 61.
:2:../internfile/mh_exec.cpp:71:MimeHandlerExec: command status 0x200:
/usr/share/recoll/filters/rclimg
:2:../internfile/internfile.cpp:412:FileInterner::internfile:
next_document error
[/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png]
:2:../internfile/internfile.cpp:494:FileInterner::internfile:
conversion ended with no doc
:4:../rcldb/rcldb.cpp:1027:Db::add: docid 17360 updated
[/home/peter/.gkrellm2-0/themes/minegue-beta/timer/bg_timer.png , ]
:4:../internfile/internfile.cpp:109:FileInterner::FileInterner:
[/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png] mime
[(null)] preview 0
:4:../internfile/internfile.cpp:170:FileInterner::FileInterner:
image/png [/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png]
:4:../internfile/internfile.cpp:357:FileInterner::internfile. ipath []
:4:../utils/execmd.cpp:163:ExecCmd::doexec: ((nil)|0x97116b4)
/usr/share/recoll/filters/rclimg
{/home/peter/.gkrellm2-0/themes/minegue-beta/bg_grid.png}
Can't locate Image/ExifTool.pm in @INC (@INC contains: /etc/perl
/usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.10.0 /usr/lib/perl5
/usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10
/usr/local/lib/site_perl .) at /usr/share/recoll/filters/rclimg 

Bug#500690: [recoll] Problems with missing document handling packages

2008-09-30 Thread Peter Salisbury
Package: recoll
Version: 1.10.6-1

I installed recoll on a fairly sparse system and it took ages to index
every time. It was only when I ran it from a terminal that I realised
it was missing some required packages for indexing certain types of
file. Ideally a better message would be given via the UI, and/or it
would skip the types of file it can't index rather than take the time
to fail at runtime. But perhaps at least these extra packages could be
depend/recommend/suggested by recoll. The ones I had to install were:

libimage-exiftool-perl
libid3-3.8.3-dev
pstotext

Really excellent program which found my file in the 'safe place' where
I'd lost it!

Thanks, Peter

--- System information. ---
Architecture: i386
Kernel:   Linux 2.6.26-1-686

Debian Release: lenny/sid
  500 unstableftp.uk.debian.org

--- Package information. ---
Depends (Version) | Installed
=-+-==
libc6  (= 2.7-1) | 2.7-13
libgcc1  (= 1:4.1.1) | 1:4.3.2-1
libice6  (= 1:1.0.0) | 2:1.0.4-1
libqt3-mt   (= 3:3.3.8b) | 3:3.3.8b-5
libsm6| 2:1.0.3-2
libstdc++6 (= 4.2.1) | 4.3.2-1
libx11-6  | 2:1.1.4-2
libxapian15   | 1.0.7-3
libxext6  | 2:1.0.4-1
zlib1g   (= 1:1.1.4) | 1:1.2.3.3.dfsg-12



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#500690: [recoll] Problems with missing document handling packages

2008-09-30 Thread Kartik Mistry
severity 500690 wishlist
thanks

On Tue, Sep 30, 2008 at 6:45 PM, Peter Salisbury
[EMAIL PROTECTED] wrote:
 Package: recoll
 Version: 1.10.6-1

Thanks for reporting bug, Peter.

 I installed recoll on a fairly sparse system and it took ages to index
 every time. It was only when I ran it from a terminal that I realised
 it was missing some required packages for indexing certain types of
 file. Ideally a better message would be given via the UI, and/or it
 would skip the types of file it can't index rather than take the time
 to fail at runtime. But perhaps at least these extra packages could be
 depend/recommend/suggested by recoll. The ones I had to install were:

 libimage-exiftool-perl
 libid3-3.8.3-dev
 pstotext

I will check and add them in suggest as we did with other programs.

 Really excellent program which found my file in the 'safe place' where
 I'd lost it!

Thanks. You should thanks to Jean-Francois Dockes (CC'ed) for writing
excellent program :)

-- 
 Cheers,
 Kartik Mistry | 0xD1028C8D | IRC: kart_
 Homepage: people.debian.org/~kartik
 Blog.en: ftbfs.wordpress.com
 Blog.gu: kartikm.wordpress.com



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#500690: [recoll] Problems with missing document handling packages

2008-09-30 Thread Jean-Francois Dockes
Hello,

Kartik Mistry writes:
  On Tue, Sep 30, 2008 at 6:45 PM, Peter Salisbury
  [EMAIL PROTECTED] wrote:
   I installed recoll on a fairly sparse system and it took ages to index
   every time. It was only when I ran it from a terminal that I realised
   it was missing some required packages for indexing certain types of
   file. Ideally a better message would be given via the UI, and/or it
   would skip the types of file it can't index rather than take the time
   to fail at runtime. But perhaps at least these extra packages could be
   depend/recommend/suggested by recoll. The ones I had to install were:
  
   libimage-exiftool-perl
   libid3-3.8.3-dev
   pstotext

I can think of no reason why Recoll indexing should be slower when the
helper programs are not installed (so I'm quite probably missing
something).

I do agree that missing helpers should somehow be listed in the UI when
indexing finishes, this has been on the todo list for ages, the difficulty
is for the implementation not to get ennoying if the user doesn't want to
install them. They are listed at the end of the error/debug log, but nobody
looks at this of course.

Normally, file types which can't be indexed by content (no helper package)
are indexed by file name the first time, and then skipped if they don't
change. After installing the helper, you need a full reindex (recollindex
-z) to get them indexed.

If Peter can spare some time to do more testing, I'd be quite interested by
the output of the following sequence:

-  Add loglevel = 4 to ~/.recoll/recoll.conf
-  Uninstall the 3 helper packages, then:

time recollindex -z 2 /tmp/rcllog-znopack.txt
time recollindex2 /tmo/rcllog-nopack.txt

- Reinstall the 3 packages then:

time recollindex -z 2 /tmp/rcllog-zpack.txt
time recollindex2 /tmo/rcllog-pack.txt

The log files should at least contain file names, but they might also
contain data in some error cases. If no confidentiality issues prevent it,
and in case the timings of the first phase are indeed longer, I'd be quite
interested to have a look at them.
 
   Really excellent program which found my file in the 'safe place' where
   I'd lost it!

Great, I'm glad that this thing can be of some use from time to time !

Cheers,
J.F. Dockes



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]