Header cache patch and searching through multiple mailboxes

2002-01-27 Thread Bruno Postle

[ Dire-Warning: this is a proof of concept script, it works for me on my
system.  Other than that I can't say, except that it requires a
development version of mutt and a patch that is labelled by it's author
as broken... ]

I have about 50,000 messages in a couple of dozen nfs mounted Maildirs
(all the old stuff is in mbox archives) - Searching all this mail is a
big drag, since mutt can't 'limit' across multiple mailboxes and
grepping through all this junk can take hours.

Michael Elkins' header cache patch is an experiment in speeding up
access to Maildirs by scattering db files everywhere - I have the idea
that eventually these db files can speed-up searching as well.

This script can search those 50,000 email headers and populate a
temporary results folder containing 3,000 messages in about 40 seconds.
It's an *extremely* blunt instrument, but I hope it will inspire
somebody else to write something a bit more elegant and precise.

It can be called with a macro, though you have to switch to the
'=search-results' folder to see the results:

macro index \cL shell-escape'search-maildir.pl '

It takes a single perl-regex argument, which it matches against
everything, sort-of like these:

[EMAIL PROTECTED]

'(Newbie Question|unsubscribe|@yahoo.com|@hotmail.com|BIG5)'

-- 
Bruno


#!/usr/bin/perl
use strict;
use Fcntl;
use DB_File;
use File::Find;

# This script requires the mutt header caching patch from
# http://www.sigpipe.org:8080/mutt/ NOTE: currently this patch is declared
# broken, so this will probably delete all your files, drink your beer and eat
# your cat.  Bruno Postle [EMAIL PROTECTED]

# Change these and make sure $resultdir exists as a Maildir, NOTE: this script
# will *delete* all existing files in $resultdir
my $mailfolder = /home/bruno/Mail;
my $resultdir = $mailfolder/search-results;

my $pattern = $ARGV[0];
my ( %hash, @folders );

unlink $resultdir/cur/*;

find ( { wanted = \maildirs }, $mailfolder );

sub maildirs
{
return unless /hcache\.db/;
return if ( $File::Find::name =~ /$resultdir/ );
my $maildir = $File::Find::name;
$maildir =~ s/\/hcache\.db//;
push @folders, $maildir;
}

foreach ( @folders )
{
my $x = tie %hash, DB_File, $_/hcache.db
or die Cannot open $_/hcache.db: $!\n;
find ( { wanted = \messages }, $_ );
undef $x;
untie %hash;
}

sub messages
{
 return unless /:/;
 my $uid = $File::Find::name;
 $uid =~ s/.*\/([^\/]*):.*/${1}/;
 if ( $hash{$uid} =~ /$pattern/i )
 {
 link ( $File::Find::name, $resultdir/cur/$uid:2,S );
 }
}

1;





Re: Searching in multiple mailboxes

2000-10-25 Thread Jack McKinney

Big Brother tells me that Mark Weinem wrote:
 On Mon, 23 Oct 2000, Benjamin Korvemaker wrote:
 
  See "grepm" and "grepmail"
 
 But are there no tools for Maildirs?

cd Maildir;
find . -type f | xargs fgrep -l searchstring

--
"Restore your inalienable human rights.   Jack McKinney
 Vote Libertarian.  http://www.lp.org http://www.lorentz.com
 http://www.harrybrowne2000.org   [EMAIL PROTECTED]
  1024D/D68F2C07 4096g/38AEF076

 PGP signature


Re: Searching in multiple mailboxes

2000-10-25 Thread Mark Weinem

On Wed, 25 Oct 2000, Suresh Ramasubramanian wrote:

 I believe grepmail does maildirs rather well.

man grepmail:

"[...] Mailboxes must be traditional, UNIX /bin/mail mailbox
format [...]"

Ciao,
Mark




Re: Searching in multiple mailboxes

2000-10-25 Thread Mark Weinem

On Wed, 25 Oct 2000, Jack McKinney wrote:

 cd Maildir;
 find . -type f | xargs fgrep -l searchstring

Wow, what a comfortable search tool ;-)

Ciao,
Mark


 PGP signature


Re: Searching in multiple mailboxes

2000-10-25 Thread Rich Lafferty

On Wed, Oct 25, 2000 at 07:46:33PM +0200, Mark Weinem ([EMAIL PROTECTED]) 
wrote:
 On Wed, 25 Oct 2000, Suresh Ramasubramanian wrote:
 
  I believe grepmail does maildirs rather well.
 
 man grepmail:
 
 "[...] Mailboxes must be traditional, UNIX /bin/mail mailbox
 format [...]"

Maybe I'm confused as to what maildir-format comprises, but wouldn't
one search maildirs with plain old 'grep'? 

  -Rich

-- 
-- Rich Lafferty ---
 Sysadmin/Programmer, Instructional and Information Technology Services
   Concordia University, Montreal, QC (514) 848-7625
- [EMAIL PROTECTED] --



Re: Searching in multiple mailboxes

2000-10-25 Thread Jack McKinney

Big Brother tells me that Mark Weinem wrote:
 On Wed, 25 Oct 2000, Jack McKinney wrote:
 
  cd Maildir;
  find . -type f | xargs fgrep -l searchstring
 
 Wow, what a comfortable search tool ;-)

For those who remember reading news this way, I thought you'd appreciate
it.  Sometimes simple solutions are best.  If one is using zsh, one could
try this:

mutt -f (cat $(find . -type f | xargs fgrep -l searchstring)).  This might
not work due to the missing 'From ' line, but that can always be added:

mutt -f (for i in $(find . -type f | xargs fgrep -l searchstring) ; do ; grep '^From: 
' $i | head -1 | sed s/From:/From/ ; cat $i ; echo ; done)

If one is using a lesser shell, something like this might work:

for i in `find . -type f | xargs fgrep -l searchstring` ; do mutt -f $i ; done

None of these are tested, BTW.

I have been meaning to patch the mailindex package I posted about earlier
to process maildirs (it would be a lot easier to write than the way it is
currently written, which has to parse mailboxes).

--
"Restore your inalienable human rights.   Jack McKinney
 Vote Libertarian.  http://www.lp.org http://www.lorentz.com
 http://www.harrybrowne2000.org   [EMAIL PROTECTED]
  1024D/D68F2C07 4096g/38AEF076

 PGP signature


Re: Searching in multiple mailboxes

2000-10-24 Thread Mark Weinem

On Mon, 23 Oct 2000, Benjamin Korvemaker wrote:

 See "grepm" and "grepmail"

But are there no tools for Maildirs?


Ciao,
Mark


 PGP signature


Re: Searching in multiple mailboxes

2000-10-24 Thread Suresh Ramasubramanian

Mark Weinem proclaimed on mutt-users that: 

 On Mon, 23 Oct 2000, Benjamin Korvemaker wrote:
 
  See "grepm" and "grepmail"
 
 But are there no tools for Maildirs?

I believe grepmail does maildirs rather well.

-- 
Suresh Ramasubramanian + Wallopus Malletus Indigenensis
mallet @ cluestick.org + Lumber Cartel of India, tinlcI
I never met a piece of chocolate I didn't like.



Searching in multiple mailboxes

2000-10-23 Thread Wouter Verheijen

There is something that would be nice to have in Mutt:
Searching in multiple (or all) mailboxes.
Imagine this scenario: You are looking for a specified text
in every message you have. It is only possible to search one
mailbox, so this might be handy.


-- 
Wouter Verheijen
[EMAIL PROTECTED]



Re: Searching in multiple mailboxes

2000-10-23 Thread Jack McKinney

Big Brother tells me that Wouter Verheijen wrote:
 There is something that would be nice to have in Mutt:
 Searching in multiple (or all) mailboxes.
 Imagine this scenario: You are looking for a specified text
 in every message you have. It is only possible to search one
 mailbox, so this might be handy.

   Not as easy as one would hope.  This could be VERY slow, depending on
how much mail you have (I currently have 192MB, AFTER compression).

   A better solution is to index your mail.  I wrote a perl/MySQL package
to handle this a while back.  It has a couple of bugs that still need to
be worked out when I get a chance:

 http://www.lorentz.com/mailindex.tar.gz

--
"Restore your inalienable human rights.   Jack McKinney
 Vote Libertarian.  http://www.lp.org http://www.lorentz.com
 http://www.harrybrowne2000.org   [EMAIL PROTECTED]
  1024D/D68F2C07 4096g/38AEF076

 PGP signature


Re: Searching in multiple mailboxes

2000-10-23 Thread Benjamin Korvemaker

On Mon, Oct 23, 2000 at 08:58:06PM +0200, Wouter Verheijen wrote:
 There is something that would be nice to have in Mutt:
 Searching in multiple (or all) mailboxes.
 Imagine this scenario: You are looking for a specified text
 in every message you have. It is only possible to search one
 mailbox, so this might be handy.

See "grepm" and "grepmail"

http://privat.schlund.de/b/barsnick/sw/grepm.html
http://grepmail.sourceforge.net/
-- 
Benjamin KorvemakerDonkeys kill more people
[EMAIL PROTECTED]  annually than plane crashes.

 PGP signature