Hello Roman, Sunday, July 31, 2005, 4:49:57 PM, you wrote:
RK> wouldn't it be nice if TB had powerful search features like Gmail or
RK> Apple's Spotlight?
It would, yes.
RK> I know that TB's has a search feature, but it's slow since emails
RK> aren't pre-indexed for keywords.
Personally, I find the speed of The Bat!'s search to be more than
satisfactory - it's very fast, considering it's just a "brute force"
search which opens every single mail and scans for the text string.
Pre-indexing is difficult, though. The very first problem that springs
to mind is IMAP - with IMAP accounts, you might not actually have a
local copy of the text of the email to index.
Then there's the question of what sorts of features your indexing and
searching engine is to support. This can get, um, complicated.
As a demonstration, allow me to introduce the much misunderstood
product Lotus Notes. (Yes, yes. Quiet at the back there. Lotus Notes
is not a competitor for The Bat! - The Bat! is better for personal
email, Notes is better for large organisations. Trust me on this.)
Notes has had full-text (index-based searching for years. Options I
can choose when creating an index include:
* Indexing encrypted fields
* Indexing attachments
* Using found text
* Using file filters
* Case sensitivity
* Indexing break information
Encrypted fields are the first problem for The Bat! - I'd expect it to
offer the option (and a warning that it's ludicrously insecure to do
so), but that means that the indexing engine must work only when you
read the message - or it has to cache the credentials for PGP/GPG or
whatever you encrypt with. That could get messy and draw flak from
some paranoid quarters...
Indexing attachments is harder than it sounds. The two sub-options
there are simple - indexing with found text basically discards any
non-text characters, ignores whitespace and then indexes the found
words. This is fast, but if a word in a Word document has a
soft-hyphen code in the middle of it because it's split across two
lines, you'll not find that instance of the word in that document.
To solve that problem, you can index with file filters, which will
understand the format of the document and index just the actual text.
They know to skip soft hyphens, bold/underline markings, and so forth.
Unfortunately, file filters are complicated to write, and just when
you perfect them the vendor of the file format has a tendency to
release a new version of their program which uses a new version of the
file format. :-(
Case sensitivy is a simple option. It's generally bad, and should be
ignored. ;-)
Indexing break information is a fantastic trick. The indexing engine
spots new lines and new paragraphs. You can then do a search using
keywords like "SENTENCE", "PARAGRAPH" and "NEAR" - the first returns
only results with both words in the same sentence, the second in only
the same paragraph, and the third is like an AND search - only
returning when both word are in the same document - except it sorts
the results so that messages with the words nearer each other are
higher up the list. It's very funky. :-)
Of course, I'm not suggesting that these features are in any way
mandatory - the index breaks and case sensitivity are strictly
optional. But you can hopefully quickly see that even the remaining
features (encrypted mail, attachments) could be complex to implement,
and a quick trip to Google will probably show up other potential
features that I've not even touched upon yet. Like how to get a decent
ranking of your results, for starters.
And then, to be frank, there's the interface. A search dialogue box is
fine, but what if I just want to search within the folder? I'd rather
see the search interface implemented at the folder level, and provide
searching across everything using a virtual folder which contains all
messages. Searching within the folder just feels nicer than searching
within a separate dialogue box - much more natural.
Search engines typically work because they have a simple, easy to use
interface which hides the power of their features. It's why Google is
popular with both geeks and the technically illiterate - power with
simplicity. But the in-folder search needed to provide it might
require quite a change in the folder handling code of The Bat!. So if
the complications of the index engine don't drive The Bat!'s
developers to insanity, changing The Bat! to do the searches might
well do it! ;-)
I don't mean to sound overly negative, by the way. I love the idea of
a more powerful, index-based search. I want a decent search
capability. I just think that it's going to be the sort of change
which sounds easy, but is actually major enough "under the hood" that
you'd have to put it into a major revision of the product. I'd like to
see index-based search in version 4, though - it would get my upgrade
money for sure!
RK> Or is there any plugin or other third-party software that would do
RK> that with TB's e-mails?
Google Desktop has been mentioned, but I can't vouch for it as Google
Desktop Search refuses to run on my machine due to incompatibilities
with my anti-virus system. I hope you have more luck with it than I
did, though!
--
Best regards,
Philip mailto:[EMAIL PROTECTED]
Using The Bat! v3.51.10 on Windows XP 5.1 Build 2600
Service Pack 2
pgphfPZPssM9r.pgp
Description: PGP signature
________________________________________________ Current version is 3.51.10 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html

