Re: [Evolution-hackers] Folder summaries with mmap() (version nine -- fixes a leak)

2006-07-17 Thread Philip Van Hoof
On Tue, 2006-07-11 at 13:17 +0200, Philip Van Hoof wrote:
 This is a version (version eight) that actually works (afaics).

This version (version nine) fixes a leak when new messages arrive.

However. The implementation/architecture of Camel requires that new
messages get transformed into CamelMessageInfo instances and added to
the CamelSummaryInfo structure. I think it would be better to simply
a.s.a.p. write it to the summary file and at the end of getting the new
messages, create a new CamelSummaryInfo structure with the new message
info instances ... using the mmap technique.

Currently it's obviously consuming more memory when new messages arrive
(compared to the mmap technique). In Evolution, such a folder never gets
closed. So it will for ever keep using this memory structure (in stead
of the information using the mmap).

Doing this would require appending to the summary file, closing the
summary object instance (which closes the mmap), reopening it when
receiving new messages completed and letting the mmap code parse that
into message-info instances.

Note .. starting from add_message_from_data (in camel-imap-folder.c),
there's a leak.

I think the leak is at camel_folder_summary_info_new_from_message and
more specific the message_info_new_from_header. The leak is ~200 kb for
a folder of ~10,000 headers

http://pvanhoof.be/files/camel_folder_summary_with_mmap_fixes09.diff

There's lots of decoding and copying happening there. So it's not going
to be easy to spot it.


 Therefore I'm re-adding the evolution-patches mailing list. Because the
 mailing lists are slow, I'm also adding some people that might have a
 direct interest in this.
 
 Please review this patch carefully and multiple times. Please do ask me
 questions about it (if needed). Please do test it extensively. Please do
 launch your evolution with valgrind massif and also under gdb.
 
 In case a crash happens in the ..decode_uint32 method , 'up' twice and
 'print *s' so that you know the filename and other interesting infor-
 mation. Also check s-filepos to know the current file position and for
 example check variables like 'count' and 'len' after doing one 'up'.
 
 You can also find a copy here. I will upload new versions to the same
 location (and will increase the version number in the filename).
 
 http://pvanhoof.be/files/camel_folder_summary_with_mmap_fixes08.diff
 

-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be
? .broken-date-parser.c.swp
? camel-mime-tables.c
Index: camel-file-utils.c
===
RCS file: /cvs/gnome/evolution-data-server/camel/camel-file-utils.c,v
retrieving revision 1.17
diff -u -p -r1.17 camel-file-utils.c
--- camel-file-utils.c	2 Jun 2006 00:52:29 -	1.17
+++ camel-file-utils.c	11 Jul 2006 14:34:18 -
@@ -105,6 +105,46 @@ camel_file_util_decode_uint32 (FILE *in,
 }
 
 
+unsigned char*
+camel_file_util_mmap_decode_uint32 (unsigned char *start, guint32 *dest, gboolean is_string)
+{
+	guint32 value = 0, rlen;
+	int v;
+	unsigned char *retval;
+
+	/* until we get the last byte, keep decoding 7 bits at a time */
+
+while ( ((v = *start++)  0x80) == 0 ) {
+value |= v;
+value = 7;
+}
+
+	rlen = value | (v  0x7f);
+	retval = start;
+
+	if (is_string) {
+#ifdef DEVEL
+		if (rlen == 1) {
+			rlen = 0;
+		} else {
+			unsigned int slen = strlen ((char*)retval)+1;
+			if (slen != rlen) {
+printf (Problem in fileformat. String was %d, should be %d for %s %02x at %d\n, rlen, slen, retval, *retval, retval);
+rlen = slen;
+			}
+		}
+#else
+		if (rlen == 1)
+			rlen = 0;
+#endif
+
+	}
+
+
+	*dest = rlen;
+	return retval;
+}
+
 /**
  * camel_file_util_encode_fixed_int32:
  * @out: file to output to
@@ -167,7 +207,7 @@ int			\
 camel_file_util_decode_##type(FILE *in, type *dest)	\
 {			\
 	type save = 0;	\
-	int i = sizeof(type) - 1;			\
+	int i = sizeof(type) -1;			\
 	int v = EOF;	\
 			\
 while (i = 0  (v = fgetc (in)) != EOF) {	\
@@ -181,6 +221,24 @@ camel_file_util_decode_##type(FILE *in, 
 }
 
 
+#define MMAP_DECODE_T(type)\
+unsigned char*		\
+camel_file_util_mmap_decode_##type(unsigned char *start, type *dest)	\
+{			\
+	type save = 0;	\
+	int i = sizeof(type) - 1;			\
+	int v;		\
+			\
+while (i = 0) {\
+		v = *start++;\
+		save |= ((type)v)  (i * 8);		\
+		i--;	\
+	}		\
+	*dest = save;	\
+	return start;	\
+}
+
+
 /**
  * camel_file_util_encode_time_t:
  * @out: file to output to
@@ -202,6 +260,7 @@ CFU_ENCODE_T(time_t)
  * Return value: 0 on success, -1 on error.
  **/
 CFU_DECODE_T(time_t)
+MMAP_DECODE_T(time_t)
 
 /**
  * camel_file_util_encode_off_t:
@@ -225,6 +284,7 @@ CFU_ENCODE_T(off_t)
  * Return value: 0 on success, -1 on failure.
  **/
 CFU_DECODE_T(off_t)

Re: [Evolution-hackers] Folder summaries with mmap() (version nine -- fixes a leak)

2006-07-17 Thread Philip Van Hoof
On Tue, 2006-07-11 at 16:47 +0200, Philip Van Hoof wrote:
 On Tue, 2006-07-11 at 13:17 +0200, Philip Van Hoof wrote:
  This is a version (version eight) that actually works (afaics).
 
 This version (version nine) fixes a leak when new messages arrive.

This version (version ten) fixes a few strdup()'s, free()'s and other
stuff. Carefully look at the comments of the providers.

I had to disable some things that would otherwise corrupt the mmap()ed
memory block.

These things probably need some refactoring thoughts.

However. I'm sending this using an Evolution that is running from
LD_LIBRARY_PATH=/opt/camel/lib. So far Evolution hasn't crashed and it's
still writing summary new files for all my folders ;-).

http://pvanhoof.be/files/camel_folder_summary_with_mmap_fixes10.diff



-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be
? camel-mime-tables.c
Index: camel-file-utils.c
===
RCS file: /cvs/gnome/evolution-data-server/camel/camel-file-utils.c,v
retrieving revision 1.17
diff -u -p -r1.17 camel-file-utils.c
--- camel-file-utils.c	2 Jun 2006 00:52:29 -	1.17
+++ camel-file-utils.c	11 Jul 2006 19:50:11 -
@@ -105,6 +105,46 @@ camel_file_util_decode_uint32 (FILE *in,
 }
 
 
+unsigned char*
+camel_file_util_mmap_decode_uint32 (unsigned char *start, guint32 *dest, gboolean is_string)
+{
+	guint32 value = 0, rlen;
+	int v;
+	unsigned char *retval;
+
+	/* until we get the last byte, keep decoding 7 bits at a time */
+
+while ( ((v = *start++)  0x80) == 0 ) {
+value |= v;
+value = 7;
+}
+
+	rlen = value | (v  0x7f);
+	retval = start;
+
+	if (is_string) {
+#ifdef DEVEL
+		if (rlen == 1) {
+			rlen = 0;
+		} else {
+			unsigned int slen = strlen ((char*)retval)+1;
+			if (slen != rlen) {
+printf (Problem in fileformat. String was %d, should be %d for %s %02x at %d\n, rlen, slen, retval, *retval, retval);
+rlen = slen;
+			}
+		}
+#else
+		if (rlen == 1)
+			rlen = 0;
+#endif
+
+	}
+
+
+	*dest = rlen;
+	return retval;
+}
+
 /**
  * camel_file_util_encode_fixed_int32:
  * @out: file to output to
@@ -167,7 +207,7 @@ int			\
 camel_file_util_decode_##type(FILE *in, type *dest)	\
 {			\
 	type save = 0;	\
-	int i = sizeof(type) - 1;			\
+	int i = sizeof(type) -1;			\
 	int v = EOF;	\
 			\
 while (i = 0  (v = fgetc (in)) != EOF) {	\
@@ -181,6 +221,24 @@ camel_file_util_decode_##type(FILE *in, 
 }
 
 
+#define MMAP_DECODE_T(type)\
+unsigned char*		\
+camel_file_util_mmap_decode_##type(unsigned char *start, type *dest)	\
+{			\
+	type save = 0;	\
+	int i = sizeof(type) - 1;			\
+	int v;		\
+			\
+while (i = 0) {\
+		v = *start++;\
+		save |= ((type)v)  (i * 8);		\
+		i--;	\
+	}		\
+	*dest = save;	\
+	return start;	\
+}
+
+
 /**
  * camel_file_util_encode_time_t:
  * @out: file to output to
@@ -202,6 +260,7 @@ CFU_ENCODE_T(time_t)
  * Return value: 0 on success, -1 on error.
  **/
 CFU_DECODE_T(time_t)
+MMAP_DECODE_T(time_t)
 
 /**
  * camel_file_util_encode_off_t:
@@ -225,6 +284,7 @@ CFU_ENCODE_T(off_t)
  * Return value: 0 on success, -1 on failure.
  **/
 CFU_DECODE_T(off_t)
+MMAP_DECODE_T(off_t)
 
 /**
  * camel_file_util_encode_size_t:
@@ -248,6 +308,7 @@ CFU_ENCODE_T(size_t)
  * Return value: 0 on success, -1 on failure.
  **/
 CFU_DECODE_T(size_t)
+MMAP_DECODE_T(size_t)
 
 
 /**
@@ -269,11 +330,22 @@ camel_file_util_encode_string (FILE *out
 	
 	if ((len = strlen (str))  65536)
 		len = 65536;
+
+	if (len==0) len=-1;
 	
 	if (camel_file_util_encode_uint32 (out, len+1) == -1)
 		return -1;
-	if (len == 0 || fwrite (str, len, 1, out) == 1)
-		return 0;
+
+	if (len != 0)
+	{
+		if (fwrite (str, len, 1, out) == 1)
+		{
+			if (fputc ('\0', out) == -1)
+return -1;
+			return 0;
+		}
+	}
+
 	return -1;
 }
 
@@ -291,7 +363,8 @@ int
 camel_file_util_decode_string (FILE *in, char **str)
 {
 	guint32 len;
-	register char *ret;
+	register char *ret, c;
+	long a;
 
 	if (camel_file_util_decode_uint32 (in, len) == -1) {
 		*str = NULL;
@@ -311,7 +384,14 @@ camel_file_util_decode_string (FILE *in,
 		return -1;
 	}
 
+	a = ftell (in);
+
+	if ((c = fgetc (in)) != '\0')
+		/* If this is the old format, oeps .. we are so sorry */
+		fseek (in, a, SEEK_SET);
+
 	ret[len] = 0;
+
 	*str = ret;
 	return 0;
 }
Index: camel-file-utils.h
===
RCS file: /cvs/gnome/evolution-data-server/camel/camel-file-utils.h,v
retrieving revision 1.11
diff -u -p -r1.11 camel-file-utils.h
--- camel-file-utils.h	10 Jan 2006 07:56:46 -	1.11
+++ camel-file-utils.h	11 Jul 2006 19:50:12 -
@@ -46,6 +46,7 @@ int camel_file_util_encode_fixed_int32 (
 int camel_file_util_decode_fixed_int32 (FILE *in, gint32 *);
 int camel_file_util_encode_uint32 

Re: [Evolution-hackers] Folder summaries with mmap() (version nine -- fixes a leak)

2006-07-17 Thread Philip Van Hoof
On Wed, 2006-07-12 at 09:12 +0530, Harish Krishnaswamy wrote:
 On Tue, 2006-07-11 at 20:33 +0200, Philip Van Hoof wrote:
  On Tue, 2006-07-11 at 23:14 +0530, Harish Krishnaswamy wrote:

  ps. I think we should do a in-depth tech-meeting about this stuff. What
  do you think, Harish? I have a few ideas on reducing memory usage when
  Evolution is online (if Evolution is online, summary information of new
  messages is still stored in memory, and not immediately reloaded using
  mmap) -- this makes Evolution consume (a lot) more memory when its
  online, when using the patch (as much as the current implementation).

 Sure, Philip. 
 We could discuss it on the weekly IRC meeting (Wednesday 1000 UTC -
 #evolution-meet)  - though I do not think it is convenient for fejj, if
 we wishes to join. We can work out a separate meeting and post it to e-h
 later too.
 
 I am more positive about the *real* effectiveness of this meeting unlike
 the gtk-mozembed episode as we are talking with CODE and there are
 concrete achievables at an arm's length. :-). Count me in.

During such a meeting I'm going to propose actual Camel architectural
changes. They will affect mostly internal changes (like how to deal with
the CamelFolderSummary and CamelMessageInfo(Base) type from the
providers).

The external API will probably change but not afaics in mail/ something
that would affect the current Evolution code a lot.

I would mainly do the adding of messages to the CamelFolderSummary very
different. I wouldn't store it using malloc (nor strdup nor pstring_add)
in CamelMessageInfo(Base) instances. I would simply append it to the
summary file and at the end of scanning .. launch my mmap code to parse
the offsets (of also the new items) out of the mmap()ed memory region.

This is very different from the current technique, where all is simply
added and the data itself is stored in strdup, malloc and pstring_add
memory. And sometimes (when?) dumped to the summary file. But where the
CamelFolderSummary instance isn't very often reloaded (which would give
it the chance to use my mmap code).

This is why when being online, Evolution still consumes +220M. Yet when
being offline, Evolution consumes ~30M (I have 112 folders, a lot of
them with more than 10,000 headers -- mailing lists --).

ps. My original Evolution consumes ~380M


-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be
? camel-mime-tables.c
Index: camel-file-utils.c
===
RCS file: /cvs/gnome/evolution-data-server/camel/camel-file-utils.c,v
retrieving revision 1.17
diff -p -u -r1.17 camel-file-utils.c
--- camel-file-utils.c	2 Jun 2006 00:52:29 -	1.17
+++ camel-file-utils.c	12 Jul 2006 00:16:11 -
@@ -105,6 +105,54 @@ camel_file_util_decode_uint32 (FILE *in,
 }
 
 
+unsigned char*
+camel_file_util_mmap_decode_uint32 (unsigned char *start, guint32 *dest, gboolean is_string)
+{
+	guint32 value = 0, rlen;
+	int v;
+#if DEBUG
+	int a=0;
+#endif
+	unsigned char *retval;
+
+	/* until we get the last byte, keep decoding 7 bits at a time */
+
+while ( ((v = *start++)  0x80) == 0 ) {
+value |= v;
+value = 7;
+#if DEBUG
+		a++;
+		if (a  5) 
+			return NULL;
+#endif
+}
+
+	rlen = value | (v  0x7f);
+	retval = start;
+
+	if (is_string) {
+#ifdef DEVEL
+		if (rlen == 1) {
+			rlen = 0;
+		} else {
+			unsigned int slen = strlen ((char*)retval)+1;
+			if (slen != rlen) {
+printf (Problem in fileformat. String was %d, should be %d for %s %02x at %d\n, rlen, slen, retval, *retval, retval);
+rlen = slen;
+			}
+		}
+#else
+		if (rlen == 1)
+			rlen = 0;
+#endif
+
+	}
+
+
+	*dest = rlen;
+	return retval;
+}
+
 /**
  * camel_file_util_encode_fixed_int32:
  * @out: file to output to
@@ -167,7 +215,7 @@ int			\
 camel_file_util_decode_##type(FILE *in, type *dest)	\
 {			\
 	type save = 0;	\
-	int i = sizeof(type) - 1;			\
+	int i = sizeof(type) -1;			\
 	int v = EOF;	\
 			\
 while (i = 0  (v = fgetc (in)) != EOF) {	\
@@ -181,6 +229,24 @@ camel_file_util_decode_##type(FILE *in, 
 }
 
 
+#define MMAP_DECODE_T(type)\
+unsigned char*		\
+camel_file_util_mmap_decode_##type(unsigned char *start, type *dest)	\
+{			\
+	type save = 0;	\
+	int i = sizeof(type) - 1;			\
+	int v;		\
+			\
+while (i = 0) {\
+		v = *start++;\
+		save |= ((type)v)  (i * 8);		\
+		i--;	\
+	}		\
+	*dest = save;	\
+	return start;	\
+}
+
+
 /**
  * camel_file_util_encode_time_t:
  * @out: file to output to
@@ -202,6 +268,7 @@ CFU_ENCODE_T(time_t)
  * Return value: 0 on success, -1 on error.
  **/
 CFU_DECODE_T(time_t)
+MMAP_DECODE_T(time_t)
 
 /**
  * camel_file_util_encode_off_t:
@@ -225,6 +292,7 @@ CFU_ENCODE_T(off_t)
  * Return value: 0 on success, -1 on failure.
  **/
 

Re: [Evolution-hackers] Folder summaries with mmap() (version nine -- fixes a leak)

2006-07-12 Thread Philip Van Hoof
On Tue, 2006-07-11 at 23:14 +0530, Harish Krishnaswamy wrote:

I'm removing the patches mailing list ;-)

 On Tue, 2006-07-11 at 16:47 +0200, Philip Van Hoof wrote:
  Currently it's obviously consuming more memory when new messages arrive
  (compared to the mmap technique). In Evolution, such a folder never gets
  closed. So it will for ever keep using this memory structure (in stead
  of the information using the mmap).

 I expect the patch to improve things as they stand now and will run it
 through some testing to get the hard numbers. 
 Just curious to know if you whetted this change with some heavy-duty
 search folders at your end. They seem to skew the observations thus far.

The hardest test that I did with it was sorting huge folders using the
Subject and From fields. I didn't yet get Evolution stable enough to do
even more aggressive tests like, indeed, making huge vfolders and
searches.

But after seeing the sorting results, I'm almost confident it will
probably be as fast as in-real-memory. 

The big difference with huge searches (over all folders) will be that
all summary files will be hit. So it's basically a does mmap scale to
folder-count amount of mmap's?-question.

The kernel, by default, pages 16 pages ahead of time. You can increase
this using posix_madvise (which isn't platform independent, in case tml
is listening). In random mode, it will not page ahead of time at all.
This will be a nice option for mobile devices with very very very few
memory (but very very very slow sorting and searching).


ps. I think we should do a in-depth tech-meeting about this stuff. What
do you think, Harish? I have a few ideas on reducing memory usage when
Evolution is online (if Evolution is online, summary information of new
messages is still stored in memory, and not immediately reloaded using
mmap) -- this makes Evolution consume (a lot) more memory when its
online, when using the patch (as much as the current implementation).



-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers