[Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-26 Thread Philip Van Hoof

This is what happens if you try to open a truly large E-mail on a device
that has not as much memory available:

Is there something we can do about this? Can we change the MIME parsing
algorithm to be less memory demanding for example?

Note that GArray is not really very sparse with memory once you start
having a really large array. Perhaps we can in stead change this to a
normal pointer array of a fixed size (do we know the size before we
start parsing, so that we can allocate an exact size in stead, perhaps?)


0x40cc4e74   *__GI_raise  (sig=6)  at
../nptl/sysdeps/unix/sysv/linux/raise.c:67
1   0x40cc6450  *__GI_abort () at abort.c:88
2   0x40c32ba8  IA__g_logv (log_domain=0x40c5a954 "GLib",
log_level=1112474896, format=0x40c6129c "%s: failed to allocate %lu bytes",
args1=0x424f0abc) at gmessages.c:502
3   0x40c32be8  IA__g_log (log_domain=0x0, log_level=4973,
format=0x40c6129c "%s: failed to allocate %lu bytes") at gmessages.c:522
4   0x40c320c8  IA__g_realloc (mem=0x0, n_bytes=16777216) at gmem.c:172
5   0x40c097f4  g_array_maybe_expand (array=0x4161dc08, len=4973) at
garray.c:339
6   0x40c09a38  IA__g_array_append_vals (farray=0x4161dc08,
data=0x440e80, len=4071) at garray.c:132
7   0x40c0a384  IA__g_byte_array_append (array=0x4161dc08, data=0x136d
, len=1086722716) at garray.c:653
8   0x402a07d4  camel_mime_part_construct_content_from_parser
(dw=0x293f78, mp=0x50d710) at camel-mime-part-utils.c:71
9   0x402a13e0  construct_from_parser (mime_part=0x293f78, mp=0x50d710)
at camel-mime-part.c:968
10  0x402a148c  camel_mime_part_construct_from_parser
(mime_part=0x293f78, mp=0x50d710) at camel-mime-part.c:996
11  0x402a7ba8  construct_from_parser (multipart=0x2f4e90, mp=0x50d710)
at camel-multipart.c:577
12  0x402a7c8c  camel_multipart_construct_from_parser
(multipart=0x2f4e90, mp=0x50d710) at camel-multipart.c:609
13  0x402a081c  camel_mime_part_construct_content_from_parser
(dw=0x57a028, mp=0x50d710) at camel-mime-part-utils.c:122
14  0x402a13e0  construct_from_parser (mime_part=0x57a028, mp=0x50d710)
at camel-mime-part.c:968
15  0x4029ea24  construct_from_parser (dw=0x0, mp=0x50d710) at
camel-mime-message.c:597
16  0x402a148c  camel_mime_part_construct_from_parser
(mime_part=0x57a028, mp=0x50d710) at camel-mime-part.c:996
17  0x402a14c6  construct_from_stream (dw=0x57a028, s=0x571378) at
camel-mime-part.c:1012
18  0x40296070  camel_data_wrapper_construct_from_stream
(data_wrapper=0x57a028, stream=0x571378) at camel-data-wrapper.c:270
19  0x414e410c  maildir_get_message (folder=0x57bf28, uid=0x294270
"1200956392.3060_5.Nokia-N810-03-11", type=1086722716, param=1086722752,
ex=0x424f0ce8) at camel-maildir-folder.c:276
20  0x4025281e  camel_folder_get_message (folder=0x57bf28, uid=0x294270
"1200956392.3060_5.Nokia-N810-03-11", type=CAMEL_FOLDER_RECEIVE_ANY_OR_FULL,
param=-1, ex=0x414e40b9) at camel-folder.c:1191
21  0x401e3c94 




-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


[Evolution-hackers] A different implementation of imap_rescan

2008-01-26 Thread Philip Van Hoof
Hi there,

The imap_rescan function of the default IMAP provider of Evolution will
in case at the front of the IMAP Mailbox items got expunged, more or
less start marking all of the locally cached summary info as invalid.

(note. summary in this context means the envelope information of each
item that we have cached locally in a file called "summary" and that,
when loaded, loads into a integer-based index. The sequence number on
the IMAP server minus one must correspond to that index for each item)

As a result will the entire summary be re-retrieved. This is one of the
many reasons why Evolution's IMAP feels slow for the users: they removed
something and suddenly the next operation takes a very long time. That's
the summary being re-retrieved ... sometimes.

Luckily wont most people delete old items, so it wont occur often for
most people's IMAP operations. Nevertheless it can occur.

I figured it's a bit too-aggressive about this (it = the code). It can
also assume that only a few items got removed, keep how many got
removed, and shift the table's indexes according to the amount of
removed items.

If then we still see a significant mismatch, we can still nonetheless
mark as invalid, remove and perform a re-retrieval of the summary.

I do this in Tinymail's camel-lite code, and now ported it to upstream
camel in the patch that I attached. I have not tested this porting and
note that I do quite a lot of things different in Tinymail (it supports
condstore, which is heavily related to this code).

But for code review this might be interesting to post to you guys, so
here it goes ...



-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be
Index: camel-imap-folder.c
===
--- camel-imap-folder.c	(revision 8395)
+++ camel-imap-folder.c	(working copy)
@@ -664,6 +664,7 @@
 	GArray *removed;
 	gboolean ok;
 	CamelFolderChangeInfo *changes = NULL;
+	gint tr = 0;
 
 	imap_folder->need_rescan = FALSE;
 
@@ -732,16 +733,17 @@
 	 * from the summary.
 	 */
 	removed = g_array_new (FALSE, FALSE, sizeof (int));
-	for (i = 0; i < summary_len && new[i].uid; i++) {
-		info = camel_folder_summary_index (folder->summary, i);
+	for (i = 0; i + tr < summary_len && new[i].uid; i++) {
+		info = camel_folder_summary_index (folder->summary, i + tr);
 		iinfo = (CamelImapMessageInfo *)info;
 
 		if (strcmp (camel_message_info_uid (info), new[i].uid) != 0) {
+			tr++;
 			camel_message_info_free(info);
 			seq = i + 1;
 			g_array_append_val (removed, seq);
 			i--;
-			summary_len--;
+			/* summary_len--; */
 			continue;
 		}
 
@@ -762,6 +764,7 @@
 
 		camel_message_info_free(info);
 		g_free (new[i].uid);
+		new[i].uid = NULL;
 	}
 
 	if (changes) {
@@ -772,6 +775,7 @@
 	seq = i + 1;
 
 	/* Free remaining memory. */
+	i = 0;
 	while (i < summary_len && new[i].uid)
 		g_free (new[i++].uid);
 	g_free (new);
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-26 Thread Jeffrey Stedfast
On Sat, 2008-01-26 at 13:44 +0100, Philip Van Hoof wrote:
> This is what happens if you try to open a truly large E-mail on a device
> that has not as much memory available:
> 
> Is there something we can do about this? Can we change the MIME parsing
> algorithm to be less memory demanding for example?
> 
> Note that GArray is not really very sparse with memory once you start
> having a really large array. Perhaps we can in stead change this to a
> normal pointer array of a fixed size (do we know the size before we
> start parsing, so that we can allocate an exact size in stead, perhaps?)

eh, why would you change it to a GPtrArray? It doesn't hold pointers, it
holds message part content.

Unfortunately we don't know the size ahead of time.

I suppose you could use a custom byte array allocator so that you can
force it to grow by larger chunks or something, dunno.


The way GMime handles this is by not loading content into RAM, but that
may be harder to do with Camel, especially in the mbox case.


Jeff


___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-26 Thread Jeffrey Stedfast
On Sat, 2008-01-26 at 22:12 -0500, Jeffrey Stedfast wrote:
> On Sat, 2008-01-26 at 13:44 +0100, Philip Van Hoof wrote:
> > This is what happens if you try to open a truly large E-mail on a device
> > that has not as much memory available:
> > 
> > Is there something we can do about this? Can we change the MIME parsing
> > algorithm to be less memory demanding for example?
> > 
> > Note that GArray is not really very sparse with memory once you start
> > having a really large array. Perhaps we can in stead change this to a
> > normal pointer array of a fixed size (do we know the size before we
> > start parsing, so that we can allocate an exact size in stead, perhaps?)
> 
> eh, why would you change it to a GPtrArray? It doesn't hold pointers, it
> holds message part content.
> 
> Unfortunately we don't know the size ahead of time.
> 
> I suppose you could use a custom byte array allocator so that you can
> force it to grow by larger chunks or something, dunno.
>
>
> The way GMime handles this is by not loading content into RAM, but 
> that may be harder to do with Camel, especially in the mbox case.

er, I should probably explain this:

- writing the code should be relatively easy to do, but in the mbox
case, the mbox may end up getting expunged or rewritten for some other
reason which may cause problems, not sure how that would work.

I think in Maildir, as long as the fd remains open, the file won't
actually disappear after an unlink() until the fd gets closed, so that
might work out ok assuming you can spare the fd (which might be the
other problem with Evolution?).

Jeff


___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-26 Thread Jeffrey Stedfast
Something like the attached patch might work, tho it is untested.

If this doesn't work, then I suspect the problem is that the seek
position might get changed out from under the mime parser (assuming it
is using either a CamelStreamFs or an fd).

Note that camel_stream_fs_new_with_fd[_and_bounds]() calls lseek() on
the fd passed in.

>From the dup() man page:

   After  a  successful  return from dup() or dup2(), the old and new file
   descriptors may be used interchangeably.  They refer to the  same  open
   file description (see open(2)) and thus share file offset and file sta‐
   tus flags; for example,  if  the  file  offset  is  modified  by  using
   lseek(2)  on one of the descriptors, the offset is also changed for the
   other.

So my guess is that this will break the parser :(

It might break in the stream case as well, you'd have to follow the code
paths a bit to know for sure. For instance, even if creating the
seekable substream doesn't perform an underlying seek on the original
stream, setting it in a data wrapper might call camel_stream_reset()
which /might/ do an lseek() on the source fs stream.

Not an insurmountable problem to solve, but it does make things a little
more difficult and possibly touchy.

Jeff



On Sat, 2008-01-26 at 22:48 -0500, Jeffrey Stedfast wrote:
> On Sat, 2008-01-26 at 22:12 -0500, Jeffrey Stedfast wrote:
> > On Sat, 2008-01-26 at 13:44 +0100, Philip Van Hoof wrote:
> > > This is what happens if you try to open a truly large E-mail on a device
> > > that has not as much memory available:
> > > 
> > > Is there something we can do about this? Can we change the MIME parsing
> > > algorithm to be less memory demanding for example?
> > > 
> > > Note that GArray is not really very sparse with memory once you start
> > > having a really large array. Perhaps we can in stead change this to a
> > > normal pointer array of a fixed size (do we know the size before we
> > > start parsing, so that we can allocate an exact size in stead, perhaps?)
> > 
> > eh, why would you change it to a GPtrArray? It doesn't hold pointers, it
> > holds message part content.
> > 
> > Unfortunately we don't know the size ahead of time.
> > 
> > I suppose you could use a custom byte array allocator so that you can
> > force it to grow by larger chunks or something, dunno.
> >
> >
> > The way GMime handles this is by not loading content into RAM, but 
> > that may be harder to do with Camel, especially in the mbox case.
> 
> er, I should probably explain this:
> 
> - writing the code should be relatively easy to do, but in the mbox
> case, the mbox may end up getting expunged or rewritten for some other
> reason which may cause problems, not sure how that would work.
> 
> I think in Maildir, as long as the fd remains open, the file won't
> actually disappear after an unlink() until the fd gets closed, so that
> might work out ok assuming you can spare the fd (which might be the
> other problem with Evolution?).
> 
> Jeff
> 
> 
> ___
> Evolution-hackers mailing list
> Evolution-hackers@gnome.org
> http://mail.gnome.org/mailman/listinfo/evolution-hackers
Index: ChangeLog
===
--- ChangeLog	(revision 8425)
+++ ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2008-01-26  Jeffrey Stedfast  <[EMAIL PROTECTED]>
+
+	* camel-mime-part-utils.c (simple_data_wrapper_construct_from_parser):
+	If possible, keep the content on disk.
+
 2008-01-24  Matthew Barnes  <[EMAIL PROTECTED]>
 
 	* camel-object.c (camel_object_cast):
Index: camel-mime-part-utils.c
===
--- camel-mime-part-utils.c	(revision 8425)
+++ camel-mime-part-utils.c	(working copy)
@@ -57,25 +57,47 @@
 static void
 simple_data_wrapper_construct_from_parser (CamelDataWrapper *dw, CamelMimeParser *mp)
 {
+	GByteArray *buffer = NULL;
+	CamelStream *stream;
+	off_t start, end;
+	int fd = -1;
+	size_t len;
 	char *buf;
-	GByteArray *buffer;
-	CamelStream *mem;
-	size_t len;
-
+	
 	d(printf ("simple_data_wrapper_construct_from_parser()\n"));
-
-	/* read in the entire content */
-	buffer = g_byte_array_new ();
+	
+	if (!(stream = camel_mime_parser_stream (mp)))
+		fd = camel_mime_parser_fd (mp);
+	else if (!CAMEL_IS_SEEKABLE_SUBSTREAM (stream))
+		stream = NULL;
+	
+	if ((stream || fd != -1) && (start = camel_mime_parser_tell (mp)) != -1) {
+		/* we can keep content on disk */
+	} else {
+		/* need to load content into memory */
+		buffer = g_byte_array_new ();
+	}
+	
 	while (camel_mime_parser_step (mp, &buf, &len) != CAMEL_MIME_PARSER_STATE_BODY_END) {
-		d(printf("appending o/p data: %d: %.*s\n", len, len, buf));
-		g_byte_array_append (buffer, (guint8 *) buf, len);
+		if (buffer != NULL) {
+			d(printf("appending o/p data: %d: %.*s\n", len, len, buf));
+			g_byte_array_append (buffer, (guint8 *) buf, len);
+		}
 	}
-
-	d(printf("message part kept in memory!\n"));
-
-	m