Mathieu,
Are you on the #tracker on irc.freenode.net?
I went through your patches and I would like to discuss some of the
changes that you have made and it's much more efficient on the IRC.
If not than I will write an e-mail with few questions but I would prefer
over IRC.
all the best
Michal Pryc
Mathieu Dimanche wrote:
> Hi everyone
>
> Using a home-compiled SVN version (rev. 1090) on Ubuntu Gutsy (7.10), I
> wanted to index my Thunderbird emails properly but encountered some
> problems and strange behavior I felt compelled to fix. So here's a patch
> against rev. 1090 with theses improvements (Changelog order) :
>
> 1) Thunderbird email non ASCII characters :
>
> Current behaviour of the TB extension is to create temporary TMS files
> in ~/.xesam/ThunderbirdEmails/ToIndex/ which are being indexed
> asynchronously by trackerd. These files are XML-like containing
> indexable informations in CDATA sections.
>
> One problem I encountered is about strings' encoding in these CDATA
> sections. The TB extension fetches Author, Recipients and Subject from a
> nsIMsgDBHdr component, as read in the mail header, i.e. encoded in mime
> format. This means that special characters (like french accented
> letters, copyright symbol, and so on.) where weirdly encoded. Per
> example, a subject with a "é" in it, like in "Notification d'état de la
> distribution" was given to trackerd through the TMS file as
> "=\?ISO-8859-1\?Q\?Notification_d'=E9tat_de_la_distribution\?=", which
> was awfully ineffective to index the different words. Worse, some
> characters made trackerd fail to index the TMS file at all.
>
> Same behavior with recipients lists when, say, someone's surname got a
> non-ASCII character in it. Idem for the "From:" header info.
>
> So, what needed to be done was to force the TB extension to decode
> theses problematic strings. By chance, the nsIMsgDBHdr component has a
> simple way to do it using mime2DecodedXXX members. Quite easy.
>
> So TMS files where now containing ISO-8859-1 encoded data. But trackerd
> refused to read these files as the gnome functions used to read and
> parse the TMS files expected UTF-8 encoded content. So, OK, let's force
> the extension to encode the whole TMS file in unicode. This was done
> through a nsIConverterOutputStream component plugged into the
> nsIFileOutputStream previously used to write the file [1].
>
> What does the patch change then ?
> * Author, Recipients and Subject are always readable and indexable, even
> when composed with non-ASCII characters
> * TMS files are encoded in UTF-8
>
> For info, I indexed my 36000+ emails (lot of spam archiving for training
> antispamware), mainly in french and english, and not a single one failed
> to be indexed AND show up nicely in t-s-t search results.
>
>
> 2) Email Recipients and CCs string format
>
> Recipients without a name attached where indexed as "[EMAIL PROTECTED]
> [EMAIL PROTECTED]".
> Recipients with a name attached where indexed as "[EMAIL PROTECTED] Name".
>
> I was expecting "correct" email contact format like "Name
> <[EMAIL PROTECTED]>" or "[EMAIL PROTECTED]"
>
> The patch does restore this expected behaviour.
>
>
> 3) tracker-search-tool emails not showing recipient(s)
>
> t-s-t only showed Subject, Sender and Date.
>
> The patch have Recipient shown too. (french label translation provided)
>
> TODO : multiple "To :" headers seem to be indexed when appropriate, but
> only the LAST one shows up here.
>
>
> 4) tracker-preferences "Choose a folder" and "Enter a file glob" dialogs
> are not translatable
>
> Well, with the patch, they are. (french translations provided)
>
>
> 5) tracker-preferences "Use additional memory for faster indexing"
> translations
>
> An initial typo was in the additional word ("additonal"), translators
> translated well, and then the typo was corrected, but not in the po
> files. So I corrected the typo in all the po files, and now, this option
> is well translated.
>
>
> 6) hits/items transition
>
> As seen on bug #464516 [2], using item(s) instead of hit(s) is a good
> idea. Modified the french translations to reflect this (élément(s)
> instead of résultat(s)).
>
>
> 7) trackerd --help uses the system's locale
>
> On my system, LC_ALL was empty, so trackerd help usage was always
> written in default english, instead of matching my
> LC_MESSAGES="fr_FR.UTF-8".
> So, it's fixed.
>
>
> 8) bug #467151 : "Language Typo: It's Portuguese not Portugese"
>
> Fixed.
>
>
> 9) bug #504003: "empty line when adding 'Ignored File Patterns'"
>
> Fixed.
>
> In fact, this was a strange behaviour. Having "NoIndexFileTypes=;" in
> ~/.config/tracker/tracker.cfg made tracker-preferences have a blank item
> in the Ignore FileTypes list, whereas having "NoIndexFileTypes=" didn't.
> This behaviour comes from the g_key_file_get_string_list function call
> in _get_string_list.c/_get_string_list() function. Pretty sure the glib
> people should be alerted about this because it's very counter-intuitive,
> and nothing let's us expect this kind of behaviour from the
> documentation [3].
>
> Of course, everytime an empty list (ending with semi-colon) was fetched
> from ~/.config/tracker/tracker.cfg, this behaviours appeared. But no more.
>
>
> 10) bug #498041: "Thunderbird indexing option grayed out on Debian
> unstable"
>
> Fixed. Made TB indexing usable.
>
>
> 11) bug #464323: "critical warning : tracker_indexer_get_hits"
>
> Fixed. Something to do with the stopwords.
>
>
>
> I hope I respected the coding style (please review) and that someone
> will commit the patch soon.
> If committed, please assign the fixed bugs to me, I'll close them.
> BTW, I'll comment them with an explanation and link to this mail.
>
>
> Mathieu
>
>
> -------------------------------
> [1] http://developer.mozilla.org/en/docs/Writing_textual_data
> [2] http://bugzilla.gnome.org/show_bug.cgi?id=464516
> [3]
> http://library.gnome.org/devel/glib/2.14/glib-Key-value-file-parser.html#g-key-file-get-string-list
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> tracker-list mailing list
> [email protected]
> http://mail.gnome.org/mailman/listinfo/tracker-list
_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list