subject:"\"\\\[Evolution\\\-hackers\\\] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all\""

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

2009-01-07 Thread Srinivasa Ragavan

The Exchange patch looks fine to me.

-Srini.
On Mon, 2009-01-05 at 12:28 +0100, Philip Van Hoof wrote:
> On Mon, 2009-01-05 at 00:42 +0530, Srinivasa Ragavan wrote:
> 
> 
> > On Fri, 2009-01-02 at 13:25 +0100, Philip Van Hoof wrote:
> > > Hi there evos,
> > > 
> > > For an EPlugin that I'm working on I will need a Camel API to get the
> > > filename of the cache.
> > 
> > Sure and the patch seems fine to me, but the Exchange portion of the
> > patch is missing. It should be similar/simple.
> 
> Attached.
> 
> Let me know when it's all reviewed and/if I can commit it.
> 
> 
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

2009-01-06 Thread Philip Van Hoof

On Mon, 2009-01-05 at 09:41 -0500, Jeffrey Stedfast wrote:
> Philip Van Hoof wrote:
> > On Mon, 2009-01-05 at 08:25 -0500, Jeffrey Stedfast wrote:

> >
> > Maildir doesn't store individual MIME parts separately. So Mailbox is
> > equally hard to handle for metadata engines as MBox is. Only difference
> > with MBox is that we need to seek() to some location.
> >
> > So Maildir doesn't make it possible for us to let app developers
> > implement indexing plugins easily, like a typical exif extractor.
> >   
> 
> I guess, but they could just link with gmime or camel :p

Which is what Tracker is doing at this moment. But for various reasons
we still end up copying the E-mail's decoded attachments to /tmp, then
scan them with the indexer's plugins, and then unlink() the files.

Suffice to say that this ain't ideal when scanning 10.000 E-mails that
way. Much more efficient for us would be to simply enter evo's caches
and read the MIME parts as normal already decoded files.

I also think such a format would improve some of Evolution's own
features:

o. For example a making a thumbnail of an image could use the platform's
   infrastructure, and see it being cached using the thumbnail-spec.

   Less code

o. Another feature is the "Save as" feature for attachments. Instead of
   having to open a GFile and using CamelStream converted to a
   GOutputStream and decode-streaming it to that stream to save the
   attachment on the filesystem, you just copy the file.

   Less code

o. Inline image viewers: Instead of having to plug the decoded memory of
   the attachment into a blob of memory, you just use any image viewer. 

   Less code

o. Inline attached images for text/html MIME part viewers: right now
   migrating GtkHTML to WebKit or GtkMozEmbed is hard because GtkHTML
   had implemented some special thing that allows it to get itself a
   blob of memory fed as pixmap buffer for images whom src attribute
   start with "cid".

   Less code

I'm not even sure if WebKit and GtkMozEmbed support rendering blobs of
memory. Although I have been asking the developers of the respective
components at nearly each conference I meet them about this. They all
promised to at least offer some sort of infrastructure for this.

Lot's of promises ;)

After thinking about it very hard, and quite a lot, I didn't find any
good reason to store attachments in Base64 encoding. I only found
reasons why you would want to store it decoded: Less code, same features

The only exception why storing in Base64 encoding could be the feature:
"View the source of this E-mail". You can perform the Base64 encode as
the E-mail becomes visible in the E-mail source viewer, it's not a good
reason (let's say this introduces 5 lines of camel_stream_* code).

You could say: because we want to use a "standard" for our storage:

 - Mailbox can't work on Windows because the author of the spec refuses
   to change the character ':' into '!' for the filenames. Which renders
   his entire specification completely useless. Windows is not
   irrelevant, it's being used a lot. Ignoring it is like carving the
   word stupid on your head with a knife.

   But fine, let him. We are free to ignore his spec, right?

   Maildir also doesn't specify storing MIME parts as separate files.

 - MBox is just broken. You can't put 3Gigs of data in one file, require
   a rewrite of that file each time you want to remove 1kb of data out
   of it and have no index on it (this, at least, is something Maildir
   got right by letting the kernel's FS take care of that: atomic
   renames and DIR is quite good as an index).

   An MBox file is a ticking timebomb waiting to get corrupted.

   MBox also doesn't specify storing MIME parts as separate files.

 - What other formats do we have? Is there one "so called" "standard"
   format that stores MIME parts as individual "decoded" files?

   Because if not then just like the Maildir-guy I'll quickly make a
   website and give it a name. And then let's all start calling it a
   "standard". Problem solved? It's not that Maildir is really that much
   more than that. A website that describes a broken way of storing
   E-mails.

   Well, ok, a few IMAP server guys decided to use that specification to
   shut up people who say that IMAP servers that store in a binary
   format are not compatible with their freedom religions. Of course
   that's an ill-educated point of view, but who cares. Freedom! *sigh*

-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

2009-01-06 Thread Jeffrey Stedfast

Philip Van Hoof wrote:
> On Mon, 2009-01-05 at 08:25 -0500, Jeffrey Stedfast wrote:
>
>   
>> migrating away from the IMAP specific data cache would be good.
>> 
>
> Yes. I think IMAP and the local providers are the only ones that are
> still using a specialized datacache.
>
> The IMAP4 one, for example, ain't using a specialized one.
>
>   
 b) migrate away the mbox data cache (the all-in-one file crap)
 
 
>>> I'm all for it. Once I thought of doing this, but the options were like
>>> Maildir or a format of one mbox file per mail in a distributed folder
>>> [CamelDataCache sort of format, like imap4/GW/Exchange]. But IIRC Fejj,
>>> had some concern like, Local still might be good to be held in a
>>> 'standards' way. I know it hurts us on expunge/mailbox rewrite etc.
>>>   
>>>   
>> what mbox data cache? CamelDataCache would probably be the best cache to
>> use for IMAP.
>> 
>
> Although I would change CamelDataCache to store individual MIME parts as
> separate files instead of files that look like a single-mail MBox file.
>   
it's really just the raw message/rfc822 format, not really mbox -
there's no "From " line for example.

that doesn't need to be part of the cache logic. that can be part of the
key.

> I would also decode the separate MIME parts before storing if the
> original E-mail had them encoded (which is usually the case, and always
> for binary attachments). This to make it more easy for metadata engines
> to index the MIME parts, and to allow such to do this efficiently. 
>
> Perhaps also to reduce disk-space, as encoded consumes more disk-space,
> but that is for me just a nice side-effect.
>
> So my format would create a directory foreach E-mail, or prefix each
> MIME part with the uid. Perhaps
>
> INBOX/subfolders/temp/1.  // headers+multipart container
> INBOX/subfolders/temp/1.1 // multipart container
> INBOX/subfolders/temp/1.1.1   // text/plain
> INBOX/subfolders/temp/1.1.2   // text/html
> INBOX/subfolders/temp/1.2.1   // inline JPeg attachment
> INBOX/subfolders/temp/1.BODYSTRUCTURE // Bodystructure of the E-mail
> INBOX/subfolders/temp/1.ENVELOPE  // Top envelope of the E-mail
>   

sure, this can be done with the key tho. instead of using the uid as the
key, use uid.1 or uid.1.2 etc

> ps. Perhaps I would store 1.BODYSTRUCTURE in the database instead. I
> would probably store 1.ENVELOPE in the database (like how it is now).
>   
yea, I think it makes sense to store BODYSTURCTURE in the folder summary.

> I would probably on top of storing BODYSTRUCTURE and ENVELOPE in the
> database also store them in separate files. Even if most filesystems
> will consume 4k or more (sector or block size) for those mini files.
>
> To get the JPeg attachment:
>
> $ cp INBOX/subfolders/temp/1.2.1 ~/mommy.jpeg
>
> $ exif INBOX/subfolders/temp/1.2.1
> EXIF tags in 'INBOX/subfolders/temp/1.2.1' ('Intel' byte order):
> +--
> Tag |Value
>  
> +--
> Image Description   |Mommy with cake at birthday 
> Manufacturer|SONY 
>  
> Model   |DSC-T33  
>  
> ...
>
> $ tracker-search -s EMails birthday
> Results:
>   email://u...@server/INBOX/temp/1
>   email://u...@server/INBOX/temp/1#2.1
>   ~/mommy.jpeg
>
>
> [CUT]
>
>   
>> this can cause problems if you need to verify signed parts because
>> re-encoding them might not result in the same output.
>> 
>
> Ok, for signatures I guess we can make an exception and keep then
> encoded in their original format then.
>
>   
 For Maildir I recommend wasting diskspace by storing both the original
 Maildir format and in parallel store the attachments separately.

 Maildir ain't accessible by current Evolution's UI, by the way.

 For MBox I recommend TO STOP USING THIS BROKEN FORMAT. It's insane with
 today's mailboxes that easily grow to 3 gigabytes in size per user.
 
 
>>> I second your thoughts for MBox stuff. 
>>>   
>>>   
>> Eh, I think mbox works fine but I can understand wanting to move to
>> Maildir which is also fine :-)
>> 
>
> Maildir doesn't store individual MIME parts separately. So Mailbox is
> equally hard to handle for metadata engines as MBox is. Only difference
> with MBox is that we need to seek() to some location.
>
> So Maildir doesn't make it possible for us to let app developers
> implement indexing plugins easily, like a typical exif extractor.
>   

I guess, but they could just link with gmime or camel :p

Jeff
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

2009-01-06 Thread Jeffrey Stedfast

Srinivasa Ragavan wrote:
> Hey Philip,
>
> [Im lagging in my mail-replies, still a lot to go, due to my 3 week
> vacation.]
>
> On Fri, 2009-01-02 at 13:25 +0100, Philip Van Hoof wrote:
>   
>> Hi there evos,
>>
>> For an EPlugin that I'm working on I will need a Camel API to get the
>> filename of the cache.
>> 
>
> Sure and the patch seems fine to me, but the Exchange portion of the
> patch is missing. It should be similar/simple.
>   
>> I will attach a patch that adds this API. The EPlugin that I'm developing is
>> available at Bug# 565091 and more information about it can be found at
>>
>> http://live.gnome.org/Evolution/Metadata.
>>
>>
>> I added a bug for tracking this request:
>>
>> http://bugzilla.gnome.org/show_bug.cgi?id=566279
>>
>> I know that for maildir (cur, tmp, new) and mbox (seek position) it's a
>> little bit controversial to return a filename. For maildir I always use
>> the cur-file one and for mbox I added "/!seek_pos" to the end of the
>> returned filename. 
>>
>> The reason why I need this is that for indexing already cached E-mails,
>> Tracker will MIME parse what we can MIME parse. For example filenames
>> and Exif data of attached images is stolen out of the cached items, to
>> be made searchable.
>>
>> We don't want to require Evolution to eat all the code involved in
>> indexing massive amounts of file formats. Best thing we can do right now
>> is to simply pass the filenames over IPC.
>>
>> We STRONGLY recommend to the Evolution team to:
>>
>> a) migrate away the IMAP specific data cache (see c to store separate parts)
>> 
> I thought we already store parts separate. Is is just about the encoding
> or more than that? I seriously don't have an idea on this. May be Fejj,
> Sankar, Matt can add on it.
>   

migrating away from the IMAP specific data cache would be good.

>   
>> b) migrate away the mbox data cache (the all-in-one file crap)
>> 
> I'm all for it. Once I thought of doing this, but the options were like
> Maildir or a format of one mbox file per mail in a distributed folder
> [CamelDataCache sort of format, like imap4/GW/Exchange]. But IIRC Fejj,
> had some concern like, Local still might be good to be held in a
> 'standards' way. I know it hurts us on expunge/mailbox rewrite etc.
>   

what mbox data cache? CamelDataCache would probably be the best cache to
use for IMAP.

>   
>> And to
>>
>> c) invent a better storage format that doesn't store the attachments in
>> server's (usually) Base64 encoding. The one format to rule them all.
>>
>> Instead store the encoded attachments in decoded format (original file
>> format). This will reduce diskspace (encoding increases diskspace usage)
>> and will make it more easy to scan the original file for XMP and Exif
>> information. Don't try to gzip or whatever anything. None of that makes
>> any sense (original files are usually compressed ideally already).
>>
>> For example: devices that want to compress have filesystems that do this
>> for you. Don't be silly trying to do this yourself.
>>
>> By storing the encoded version the only thing you currently gain is that
>> the feature "view E-mail source" doesn't need to recode the attachments.
>>
>> This ain't a much-used feature. It doesn't have to be fast, at all.
>>
>> No it doesn't. Really it doesn't.
>> 
> Is thatz it? I need some other opinions, I don't have much thoughts
> here. Sankar, Matt, Fejj?
>   

this can cause problems if you need to verify signed parts because
re-encoding them might not result in the same output.

>> For Maildir I recommend wasting diskspace by storing both the original
>> Maildir format and in parallel store the attachments separately.
>>
>> Maildir ain't accessible by current Evolution's UI, by the way.
>>
>> For MBox I recommend TO STOP USING THIS BROKEN FORMAT. It's insane with
>> today's mailboxes that easily grow to 3 gigabytes in size per user.
>> 
> I second your thoughts for MBox stuff. 
>   

Eh, I think mbox works fine but I can understand wanting to move to
Maildir which is also fine :-)

>   
>> Once all start using the CamelDataCache API, implementing that new
>> format and implementing converters wont be very hard. 
>>
>> For existing CamelDataCache users it's just one format to convert. For
>> IMAP, mbox, Maildir and mh it's indeed a few extra formats to handle
>> using a conversion. Wont kill you to implement that, and,  I'll help.
>> 
>
> Thatz so nice of you to help us :-)
>
> -Srini
>
>
>   

Jeff
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

2009-01-05 Thread Philip Van Hoof

On Mon, 2009-01-05 at 08:25 -0500, Jeffrey Stedfast wrote:

> migrating away from the IMAP specific data cache would be good.

Yes. I think IMAP and the local providers are the only ones that are
still using a specialized datacache.

The IMAP4 one, for example, ain't using a specialized one.

> >> b) migrate away the mbox data cache (the all-in-one file crap)
> >> 
> > I'm all for it. Once I thought of doing this, but the options were like
> > Maildir or a format of one mbox file per mail in a distributed folder
> > [CamelDataCache sort of format, like imap4/GW/Exchange]. But IIRC Fejj,
> > had some concern like, Local still might be good to be held in a
> > 'standards' way. I know it hurts us on expunge/mailbox rewrite etc.
> >   
> 
> what mbox data cache? CamelDataCache would probably be the best cache to
> use for IMAP.

Although I would change CamelDataCache to store individual MIME parts as
separate files instead of files that look like a single-mail MBox file.

I would also decode the separate MIME parts before storing if the
original E-mail had them encoded (which is usually the case, and always
for binary attachments). This to make it more easy for metadata engines
to index the MIME parts, and to allow such to do this efficiently. 

Perhaps also to reduce disk-space, as encoded consumes more disk-space,
but that is for me just a nice side-effect.

So my format would create a directory foreach E-mail, or prefix each
MIME part with the uid. Perhaps

INBOX/subfolders/temp/1.  // headers+multipart container
INBOX/subfolders/temp/1.1 // multipart container
INBOX/subfolders/temp/1.1.1   // text/plain
INBOX/subfolders/temp/1.1.2   // text/html
INBOX/subfolders/temp/1.2.1   // inline JPeg attachment
INBOX/subfolders/temp/1.BODYSTRUCTURE // Bodystructure of the E-mail
INBOX/subfolders/temp/1.ENVELOPE  // Top envelope of the E-mail

ps. Perhaps I would store 1.BODYSTRUCTURE in the database instead. I
would probably store 1.ENVELOPE in the database (like how it is now).

I would probably on top of storing BODYSTRUCTURE and ENVELOPE in the
database also store them in separate files. Even if most filesystems
will consume 4k or more (sector or block size) for those mini files.

To get the JPeg attachment:

$ cp INBOX/subfolders/temp/1.2.1 ~/mommy.jpeg

$ exif INBOX/subfolders/temp/1.2.1
EXIF tags in 'INBOX/subfolders/temp/1.2.1' ('Intel' byte order):
+--
Tag |Value 
+--
Image Description   |Mommy with cake at birthday 
Manufacturer|SONY  
Model   |DSC-T33   
...

$ tracker-search -s EMails birthday
Results:
  email://u...@server/INBOX/temp/1
  email://u...@server/INBOX/temp/1#2.1
  ~/mommy.jpeg

[CUT]

> this can cause problems if you need to verify signed parts because
> re-encoding them might not result in the same output.

Ok, for signatures I guess we can make an exception and keep then
encoded in their original format then.

> >> For Maildir I recommend wasting diskspace by storing both the original
> >> Maildir format and in parallel store the attachments separately.
> >>
> >> Maildir ain't accessible by current Evolution's UI, by the way.
> >>
> >> For MBox I recommend TO STOP USING THIS BROKEN FORMAT. It's insane with
> >> today's mailboxes that easily grow to 3 gigabytes in size per user.
> >> 
> > I second your thoughts for MBox stuff. 
> >   
> 
> Eh, I think mbox works fine but I can understand wanting to move to
> Maildir which is also fine :-)

Maildir doesn't store individual MIME parts separately. So Mailbox is
equally hard to handle for metadata engines as MBox is. Only difference
with MBox is that we need to seek() to some location.

So Maildir doesn't make it possible for us to let app developers
implement indexing plugins easily, like a typical exif extractor.

We would have to Base64 decode image attachments before extracting exif,
for example. Instead of just saying: here's a stream, or here's a FILE*,
go ahead and extract the info you want. (with a stream we could make it
relatively easy to auto-base64 decode, but often are these extractors
still FILE* based, not stream based).

There's IMO not really a good reason to keep the attachments stored in
their encoded version. Except the signatures, perhaps, but we don't
really need those in decoded form anyway. So it would be fine to have an
exception on signatures (to keep them encoded-stored).

Hmmaybe someday having the fingerprint information about a person might
be useful to verify the identify of an individual before linking the
person with a contact in our RDF triple store.

-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
htt

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

2009-01-05 Thread Philip Van Hoof

On Mon, 2009-01-05 at 00:42 +0530, Srinivasa Ragavan wrote:


> On Fri, 2009-01-02 at 13:25 +0100, Philip Van Hoof wrote:
> > Hi there evos,
> > 
> > For an EPlugin that I'm working on I will need a Camel API to get the
> > filename of the cache.
> 
> Sure and the patch seems fine to me, but the Exchange portion of the
> patch is missing. It should be similar/simple.

Attached.

Let me know when it's all reviewed and/if I can commit it.


-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be
Index: camel/camel-exchange-folder.c
===
--- camel/camel-exchange-folder.c	(revision 1849)
+++ camel/camel-exchange-folder.c	(working copy)
@@ -86,6 +86,7 @@
 	  CamelException *ex);
 static void refresh_info (CamelFolder *folder, CamelException *ex);
 static void exchange_sync (CamelFolder *folder, gboolean expunge, CamelException *ex);
+static char* get_filename (CamelFolder *folder, const char *uid, CamelException *ex);
 
 static void
 class_init (CamelFolderClass *camel_folder_class)
@@ -105,6 +106,7 @@
 	camel_folder_class->transfer_messages_to = transfer_messages_to;
 	camel_folder_class->refresh_info = refresh_info;
 	camel_folder_class->sync = exchange_sync;
+	camel_folder_class->get_filename = get_filename;
 }
 
 #define CAMEL_EXCHANGE_SERVER_FLAGS \
@@ -160,6 +162,7 @@
 	return camel_exchange_folder_type;
 }
 
+
 static void
 refresh_info (CamelFolder *folder, CamelException *ex)
 {
@@ -362,6 +365,14 @@
 	}
 }
 
+static char*
+get_filename (CamelFolder *folder, const char *uid, CamelException *ex)
+{
+	CamelExchangeFolder *exch = CAMEL_EXCHANGE_FOLDER (folder);
+
+	return camel_data_cache_get_filename (exch->cache, "cache", uid, NULL);
+}
+
 static GByteArray *
 get_message_data (CamelFolder *folder, const char *uid, CamelException *ex)
 {
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

2009-01-04 Thread Srinivasa Ragavan

Hey Philip,

[Im lagging in my mail-replies, still a lot to go, due to my 3 week
vacation.]

On Fri, 2009-01-02 at 13:25 +0100, Philip Van Hoof wrote:
> Hi there evos,
> 
> For an EPlugin that I'm working on I will need a Camel API to get the
> filename of the cache.

Sure and the patch seems fine to me, but the Exchange portion of the
patch is missing. It should be similar/simple.
> 
> I will attach a patch that adds this API. The EPlugin that I'm developing is
> available at Bug# 565091 and more information about it can be found at
> 
> http://live.gnome.org/Evolution/Metadata.
> 
> 
> I added a bug for tracking this request:
> 
> http://bugzilla.gnome.org/show_bug.cgi?id=566279
> 
> I know that for maildir (cur, tmp, new) and mbox (seek position) it's a
> little bit controversial to return a filename. For maildir I always use
> the cur-file one and for mbox I added "/!seek_pos" to the end of the
> returned filename. 
> 
> The reason why I need this is that for indexing already cached E-mails,
> Tracker will MIME parse what we can MIME parse. For example filenames
> and Exif data of attached images is stolen out of the cached items, to
> be made searchable.
> 
> We don't want to require Evolution to eat all the code involved in
> indexing massive amounts of file formats. Best thing we can do right now
> is to simply pass the filenames over IPC.
> 
> We STRONGLY recommend to the Evolution team to:
> 
> a) migrate away the IMAP specific data cache (see c to store separate parts)
I thought we already store parts separate. Is is just about the encoding
or more than that? I seriously don't have an idea on this. May be Fejj,
Sankar, Matt can add on it.

> b) migrate away the mbox data cache (the all-in-one file crap)
I'm all for it. Once I thought of doing this, but the options were like
Maildir or a format of one mbox file per mail in a distributed folder
[CamelDataCache sort of format, like imap4/GW/Exchange]. But IIRC Fejj,
had some concern like, Local still might be good to be held in a
'standards' way. I know it hurts us on expunge/mailbox rewrite etc.

> 
> And to
> 
> c) invent a better storage format that doesn't store the attachments in
> server's (usually) Base64 encoding. The one format to rule them all.
> 
> Instead store the encoded attachments in decoded format (original file
> format). This will reduce diskspace (encoding increases diskspace usage)
> and will make it more easy to scan the original file for XMP and Exif
> information. Don't try to gzip or whatever anything. None of that makes
> any sense (original files are usually compressed ideally already).
> 
> For example: devices that want to compress have filesystems that do this
> for you. Don't be silly trying to do this yourself.
> 
> By storing the encoded version the only thing you currently gain is that
> the feature "view E-mail source" doesn't need to recode the attachments.
> 
> This ain't a much-used feature. It doesn't have to be fast, at all.
> 
> No it doesn't. Really it doesn't.
Is thatz it? I need some other opinions, I don't have much thoughts
here. Sankar, Matt, Fejj?
> 
> For Maildir I recommend wasting diskspace by storing both the original
> Maildir format and in parallel store the attachments separately.
> 
> Maildir ain't accessible by current Evolution's UI, by the way.
> 
> For MBox I recommend TO STOP USING THIS BROKEN FORMAT. It's insane with
> today's mailboxes that easily grow to 3 gigabytes in size per user.
I second your thoughts for MBox stuff. 

> 
> 
> Once all start using the CamelDataCache API, implementing that new
> format and implementing converters wont be very hard. 
> 
> For existing CamelDataCache users it's just one format to convert. For
> IMAP, mbox, Maildir and mh it's indeed a few extra formats to handle
> using a conversion. Wont kill you to implement that, and,  I'll help.

Thatz so nice of you to help us :-)

-Srini

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers

[Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

2009-01-02 Thread Philip Van Hoof

Hi there evos,

For an EPlugin that I'm working on I will need a Camel API to get the
filename of the cache.

I will attach a patch that adds this API. The EPlugin that I'm developing is
available at Bug# 565091 and more information about it can be found at

http://live.gnome.org/Evolution/Metadata.


I added a bug for tracking this request:

http://bugzilla.gnome.org/show_bug.cgi?id=566279

I know that for maildir (cur, tmp, new) and mbox (seek position) it's a
little bit controversial to return a filename. For maildir I always use
the cur-file one and for mbox I added "/!seek_pos" to the end of the
returned filename. 

The reason why I need this is that for indexing already cached E-mails,
Tracker will MIME parse what we can MIME parse. For example filenames
and Exif data of attached images is stolen out of the cached items, to
be made searchable.

We don't want to require Evolution to eat all the code involved in
indexing massive amounts of file formats. Best thing we can do right now
is to simply pass the filenames over IPC.

We STRONGLY recommend to the Evolution team to:

a) migrate away the IMAP specific data cache (see c to store separate parts)
b) migrate away the mbox data cache (the all-in-one file crap)

And to

c) invent a better storage format that doesn't store the attachments in
server's (usually) Base64 encoding. The one format to rule them all.

Instead store the encoded attachments in decoded format (original file
format). This will reduce diskspace (encoding increases diskspace usage)
and will make it more easy to scan the original file for XMP and Exif
information. Don't try to gzip or whatever anything. None of that makes
any sense (original files are usually compressed ideally already).

For example: devices that want to compress have filesystems that do this
for you. Don't be silly trying to do this yourself.

By storing the encoded version the only thing you currently gain is that
the feature "view E-mail source" doesn't need to recode the attachments.

This ain't a much-used feature. It doesn't have to be fast, at all.

No it doesn't. Really it doesn't.

For Maildir I recommend wasting diskspace by storing both the original
Maildir format and in parallel store the attachments separately.

Maildir ain't accessible by current Evolution's UI, by the way.

For MBox I recommend TO STOP USING THIS BROKEN FORMAT. It's insane with
today's mailboxes that easily grow to 3 gigabytes in size per user.


Once all start using the CamelDataCache API, implementing that new
format and implementing converters wont be very hard. 

For existing CamelDataCache users it's just one format to convert. For
IMAP, mbox, Maildir and mh it's indeed a few extra formats to handle
using a conversion. Wont kill you to implement that, and,  I'll help.


I know c) is a controversial proposal. But the current situation really
makes NO sense. Just go look at the ugly ugly code in Camel, think about
it for a second or two, and you too will see that it just ain't making
any sense.

Not for indexing-engines like either Beagle or Tracker nor is the
needless redundancy in implementations making any sense for Evolution
itself. It's rather a maintenance burden and it's making Evolution far
less agile for supporting new capabilities (like getting its cached data
indexed by softwares that focus on search capabilities).


-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be
Index: camel/providers/nntp/camel-nntp-folder.c
===
--- camel/providers/nntp/camel-nntp-folder.c	(revision 9848)
+++ camel/providers/nntp/camel-nntp-folder.c	(working copy)
@@ -123,6 +123,25 @@
 return ((CamelFolderClass *) folder_class)->set_message_flags (folder, uid, flags, set);
 }
 
+static char*
+nntp_get_filename (CamelFolder *folder, const char *uid, CamelException *ex)
+{
+	CamelNNTPStore *nntp_store = (CamelNNTPStore *) folder->parent_store;
+	char *article, *msgid;
+
+	article = alloca(strlen(uid)+1);
+	strcpy(article, uid);
+	msgid = strchr (article, ',');
+	if (msgid == NULL) {
+		camel_exception_setv (ex, CAMEL_EXCEPTION_SYSTEM,
+  _("Internal error: UID in invalid format: %s"), uid);
+		return NULL;
+	}
+	*msgid++ = 0;
+
+	return camel_data_cache_get_filename (nntp_store->cache, "cache", msgid, ex);
+}
+
 static CamelStream *
 nntp_folder_download_message (CamelNNTPFolder *nntp_folder, const char *id, const char *msgid, CamelException *ex)
 {
@@ -483,6 +502,7 @@
 	camel_folder_class->count_by_expression = nntp_folder_count_by_expression;
 	camel_folder_class->search_by_uids = nntp_folder_search_by_uids;
 	camel_folder_class->search_free = nntp_folder_search_free;
+	camel_folder_class->get_filename = nntp_get_filename;
 }
 
 CamelType
Index: camel/providers/pop3/camel-pop3-folder.c
===
--- camel/pro

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

[Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

8 matches

Site Navigation

Mail list logo

Footer information