Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-03-04 Thread Philip Van Hoof

On Sat, 2008-01-26 at 23:22 -0500, Jeffrey Stedfast wrote:
 Something like the attached patch might work, tho it is untested.

I had to change 

else if (!CAMEL_IS_SEEKABLE_SUBSTREAM (stream))

into

else if (!CAMEL_IS_SEEKABLE_STREAM (stream))

I don't know why you where testing for substream, as substream provides
no extra functionality that seems to be related here ...

 So my guess is that this will break the parser :(
 
 It might break in the stream case as well, you'd have to follow the code
 paths a bit to know for sure. For instance, even if creating the
 seekable substream doesn't perform an underlying seek on the original
 stream, setting it in a data wrapper might call camel_stream_reset()
 which /might/ do an lseek() on the source fs stream.

The problem with the patch is that it makes each MIME part's data start
at the headers, in stead of at the actual content.

I tried determining the start right after the first call to
camel_mime_parser_step but that just resulted in start == end.


 Not an insurmountable problem to solve, but it does make things a little
 more difficult and possibly touchy.

 
 
 On Sat, 2008-01-26 at 22:48 -0500, Jeffrey Stedfast wrote:
  On Sat, 2008-01-26 at 22:12 -0500, Jeffrey Stedfast wrote:
   On Sat, 2008-01-26 at 13:44 +0100, Philip Van Hoof wrote:
This is what happens if you try to open a truly large E-mail on a device
that has not as much memory available:

Is there something we can do about this? Can we change the MIME parsing
algorithm to be less memory demanding for example?

Note that GArray is not really very sparse with memory once you start
having a really large array. Perhaps we can in stead change this to a
normal pointer array of a fixed size (do we know the size before we
start parsing, so that we can allocate an exact size in stead, perhaps?)
   
   eh, why would you change it to a GPtrArray? It doesn't hold pointers, it
   holds message part content.
   
   Unfortunately we don't know the size ahead of time.
   
   I suppose you could use a custom byte array allocator so that you can
   force it to grow by larger chunks or something, dunno.
  
  
   The way GMime handles this is by not loading content into RAM, but 
   that may be harder to do with Camel, especially in the mbox case.
  
  er, I should probably explain this:
  
  - writing the code should be relatively easy to do, but in the mbox
  case, the mbox may end up getting expunged or rewritten for some other
  reason which may cause problems, not sure how that would work.
  
  I think in Maildir, as long as the fd remains open, the file won't
  actually disappear after an unlink() until the fd gets closed, so that
  might work out ok assuming you can spare the fd (which might be the
  other problem with Evolution?).
  
  Jeff
  
  
  ___
  Evolution-hackers mailing list
  Evolution-hackers@gnome.org
  http://mail.gnome.org/mailman/listinfo/evolution-hackers
 ___
 Evolution-hackers mailing list
 Evolution-hackers@gnome.org
 http://mail.gnome.org/mailman/listinfo/evolution-hackers
-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-03-04 Thread Philip Van Hoof

Hey Jeffrey,

I did some experimenting and after this it seems to work:

http://tinymail.org/trac/tinymail/changeset/3462

I had to get the value of folder_tell at the exact location where the
state is at CAMEL_MIME_PARSER_STATE_HEADER.

Then it worked.

I tested this against for example a testing E-mail that is floating
around the Lemonade test servers of 32 MB, various simpler E-mails and
one of 7 MB. Both big E-mails mostly had image attachments.

Tinymail has a mimepart viewer that uses a pixbuf loader, and it
succeeded just fine in loading the images.

I have a message open that is 40246320 bytes in size, this is my VmRSS
for Tinymail's demoui. 

VmRSS: 16428 kB

Those 16Mb is probably data in the GtkPixbuf and the summary.

lemonade.andrew.cmu.edu:143, testuser1, pass1 and pick the largest mail
on that server (~40 MB).


On Tue, 2008-03-04 at 18:40 +0100, Philip Van Hoof wrote:
 On Sat, 2008-01-26 at 23:22 -0500, Jeffrey Stedfast wrote:
  Something like the attached patch might work, tho it is untested.
 
 I had to change 
 
   else if (!CAMEL_IS_SEEKABLE_SUBSTREAM (stream))
 
 into
 
   else if (!CAMEL_IS_SEEKABLE_STREAM (stream))
 
 I don't know why you where testing for substream, as substream provides
 no extra functionality that seems to be related here ...
 
  So my guess is that this will break the parser :(
  
  It might break in the stream case as well, you'd have to follow the code
  paths a bit to know for sure. For instance, even if creating the
  seekable substream doesn't perform an underlying seek on the original
  stream, setting it in a data wrapper might call camel_stream_reset()
  which /might/ do an lseek() on the source fs stream.
 
 The problem with the patch is that it makes each MIME part's data start
 at the headers, in stead of at the actual content.
 
 I tried determining the start right after the first call to
 camel_mime_parser_step but that just resulted in start == end.
 
 
  Not an insurmountable problem to solve, but it does make things a little
  more difficult and possibly touchy.
 
  
  
  On Sat, 2008-01-26 at 22:48 -0500, Jeffrey Stedfast wrote:
   On Sat, 2008-01-26 at 22:12 -0500, Jeffrey Stedfast wrote:
On Sat, 2008-01-26 at 13:44 +0100, Philip Van Hoof wrote:
 This is what happens if you try to open a truly large E-mail on a 
 device
 that has not as much memory available:
 
 Is there something we can do about this? Can we change the MIME 
 parsing
 algorithm to be less memory demanding for example?
 
 Note that GArray is not really very sparse with memory once you start
 having a really large array. Perhaps we can in stead change this to a
 normal pointer array of a fixed size (do we know the size before we
 start parsing, so that we can allocate an exact size in stead, 
 perhaps?)

eh, why would you change it to a GPtrArray? It doesn't hold pointers, it
holds message part content.

Unfortunately we don't know the size ahead of time.

I suppose you could use a custom byte array allocator so that you can
force it to grow by larger chunks or something, dunno.
   
   
The way GMime handles this is by not loading content into RAM, but 
that may be harder to do with Camel, especially in the mbox case.
   
   er, I should probably explain this:
   
   - writing the code should be relatively easy to do, but in the mbox
   case, the mbox may end up getting expunged or rewritten for some other
   reason which may cause problems, not sure how that would work.
   
   I think in Maildir, as long as the fd remains open, the file won't
   actually disappear after an unlink() until the fd gets closed, so that
   might work out ok assuming you can spare the fd (which might be the
   other problem with Evolution?).
   
   Jeff
   
   
   ___
   Evolution-hackers mailing list
   Evolution-hackers@gnome.org
   http://mail.gnome.org/mailman/listinfo/evolution-hackers
  ___
  Evolution-hackers mailing list
  Evolution-hackers@gnome.org
  http://mail.gnome.org/mailman/listinfo/evolution-hackers
-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-27 Thread Philip Van Hoof
We'll try this, and if it works for all mails that we wanted to test,
I'll let you know.

Thanks a lot! Adding Modest's project manager in CC

On Sat, 2008-01-26 at 23:22 -0500, Jeffrey Stedfast wrote:
 Something like the attached patch might work, tho it is untested.
 
 If this doesn't work, then I suspect the problem is that the seek
 position might get changed out from under the mime parser (assuming it
 is using either a CamelStreamFs or an fd).
 
 Note that camel_stream_fs_new_with_fd[_and_bounds]() calls lseek() on
 the fd passed in.
 
 From the dup() man page:
 
After  a  successful  return from dup() or dup2(), the old and new file
descriptors may be used interchangeably.  They refer to the  same  open
file description (see open(2)) and thus share file offset and file sta‐
tus flags; for example,  if  the  file  offset  is  modified  by  using
lseek(2)  on one of the descriptors, the offset is also changed for the
other.
 
 So my guess is that this will break the parser :(
 
 It might break in the stream case as well, you'd have to follow the code
 paths a bit to know for sure. For instance, even if creating the
 seekable substream doesn't perform an underlying seek on the original
 stream, setting it in a data wrapper might call camel_stream_reset()
 which /might/ do an lseek() on the source fs stream.
 
 Not an insurmountable problem to solve, but it does make things a little
 more difficult and possibly touchy.
 
 Jeff
 
 
 
 On Sat, 2008-01-26 at 22:48 -0500, Jeffrey Stedfast wrote:
  On Sat, 2008-01-26 at 22:12 -0500, Jeffrey Stedfast wrote:
   On Sat, 2008-01-26 at 13:44 +0100, Philip Van Hoof wrote:
This is what happens if you try to open a truly large E-mail on a device
that has not as much memory available:

Is there something we can do about this? Can we change the MIME parsing
algorithm to be less memory demanding for example?

Note that GArray is not really very sparse with memory once you start
having a really large array. Perhaps we can in stead change this to a
normal pointer array of a fixed size (do we know the size before we
start parsing, so that we can allocate an exact size in stead, perhaps?)
   
   eh, why would you change it to a GPtrArray? It doesn't hold pointers, it
   holds message part content.
   
   Unfortunately we don't know the size ahead of time.
   
   I suppose you could use a custom byte array allocator so that you can
   force it to grow by larger chunks or something, dunno.
  
  
   The way GMime handles this is by not loading content into RAM, but 
   that may be harder to do with Camel, especially in the mbox case.
  
  er, I should probably explain this:
  
  - writing the code should be relatively easy to do, but in the mbox
  case, the mbox may end up getting expunged or rewritten for some other
  reason which may cause problems, not sure how that would work.
  
  I think in Maildir, as long as the fd remains open, the file won't
  actually disappear after an unlink() until the fd gets closed, so that
  might work out ok assuming you can spare the fd (which might be the
  other problem with Evolution?).
  

-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-27 Thread Philip Van Hoof
This is very strange, though. It looks like stream=0x0 but the
mime-parser's stream ain't NULL.

(gdb) print buffer
$1 = (GByteArray *) 0x80e4dc0
(gdb) print stream
$2 = (CamelStream *) 0x0
(gdb) print *mp
$3 = {parent = {klass = 0x80def80, hooks = 0x0, ref_count = 1, flags = 0}, priv 
= 0x8272770}
(gdb) print *mp-priv
$4 = {state = CAMEL_MIME_PARSER_STATE_BODY, outbuf = 0x827e800 
Content-Transfer-Encoding: quoted-printable, 
  outptr = 0x827e800 Content-Transfer-Encoding: quoted-printable, outend = 
0x827ec00 , fd = -1, stream = 0x826ab10, ioerrno = 0, 
  realbuf = 0x827ec08 , 
  inbuf = 0x827ec88 
ش\222�\177��\034\\\004�\225=�L\2365gke�-�\037p\024\233��\023\213~LJP~\225�/���O\002�Vtc\235gǦ�\215�\206\025-\231\*ӱ\232Nz\205\036\n�\223�2U�A\237%Qn,
 inptr = 0x827fc5c 
�~y[\017ʶ���\204\037�\213�l�Z�`Qh9\235\f�+�\224\024\\p���\n\226\y�5��\220\n, 
  inend = 0x827fc88 \n, atleast = 0, seek = 413944216, unstep = 0, midline = 
1, scan_from = 0, scan_pre_from = 0, eof = 0, 
  start_of_from = -1, start_of_boundary = 11048, start_of_headers = 11092, 
header_start = -1, filterid = 1, filters = 0x0, 
  parts = 0x828c210}
(gdb) print *mp-priv-stream
$5 = {parent_object = {klass = 0x80ffdc8, hooks = 0x0, ref_count = 2, flags = 
0}, eos = 0}
(gdb) 



#define _PRIVATE(o) (((CamelMimeParser *)(o))-priv)
CamelStream *
camel_mime_parser_stream (CamelMimeParser *parser)
{
struct _header_scan_state *s = _PRIVATE (parser);

return s-stream;
}

Maybe it's not a CamelSeekableSubstream? Else would parent_stream not be F?

(gdb) print * (CamelSeekableSubstream *)mp-priv-stream
$7 = {parent_object = {parent_object = {parent_object = {klass = 0x80ffdc8, 
hooks = 0x0, ref_count = 2, flags = 0}, eos = 0}, 
position = 413948312, bound_start = 0, bound_end = -1, some_stack = '\0' 
repeats 49 times}, parent_stream = 0x16}
(gdb) 


On Sun, 2008-01-27 at 13:38 +0100, Philip Van Hoof wrote:
 Looks like the GByteArray is still being created.
 
 (gdb) break camel-mime-part-utils.c:82
 Breakpoint 2 at 0xb6dd541e: file camel-mime-part-utils.c, line 82.
 (gdb) delete 1
 (gdb) cont
 Continuing.
 
 Breakpoint 2, simple_data_wrapper_construct_from_parser (dw=0xb3f02800, 
 mp=0x827bcd0) at camel-mime-part-utils.c:82
 82  if (buffer != NULL) {
 (gdb) print buffer
 $1 = (GByteArray *) 0x80e4dc0
 (gdb) 
 
 
 Breakpoint 1, camel_mime_parser_step (parser=0x827bcd0, 
 databuffer=0xb4882f3c, datalength=0xb4882f40) at camel-mime-parser.c:610
 610 struct _header_scan_state *s = _PRIVATE (parser);
 (gdb) bt
 #0  camel_mime_parser_step (parser=0x827bcd0, databuffer=0xb4882f3c, 
 datalength=0xb4882f40) at camel-mime-parser.c:610
 #1  0xb6dd5456 in simple_data_wrapper_construct_from_parser (dw=0xb3f02800, 
 mp=0x827bcd0) at camel-mime-part-utils.c:81
 #2  0xb6dd55e9 in camel_mime_part_construct_content_from_parser 
 (dw=0x824e8c8, mp=0x827bcd0) at camel-mime-part-utils.c:127
 #3  0xb6dd6ff1 in construct_from_parser (mime_part=0x824e8c8, mp=0x827bcd0) 
 at camel-mime-part.c:968
 #4  0xb6dd70af in camel_mime_part_construct_from_parser (mime_part=0x824e8c8, 
 mp=0x827bcd0) at camel-mime-part.c:996
 #5  0xb6de1aab in construct_from_parser (multipart=0x8246f80, mp=0x827bcd0) 
 at camel-multipart.c:577
 #6  0xb6de1bea in camel_multipart_construct_from_parser (multipart=0x8246f80, 
 mp=0x827bcd0) at camel-multipart.c:609
 #7  0xb6dd5681 in camel_mime_part_construct_content_from_parser 
 (dw=0x8254570, mp=0x827bcd0) at camel-mime-part-utils.c:144
 #8  0xb6dd6ff1 in construct_from_parser (mime_part=0x8254570, mp=0x827bcd0) 
 at camel-mime-part.c:968
 #9  0xb6dd1de4 in construct_from_parser (dw=0x8254570, mp=0x827bcd0) at 
 camel-mime-message.c:597
 #10 0xb6dd70af in camel_mime_part_construct_from_parser (mime_part=0x8254570, 
 mp=0x827bcd0) at camel-mime-part.c:996
 #11 0xb6dd7122 in construct_from_stream (dw=0x8254570, s=0x826ab10) at 
 camel-mime-part.c:1012
 #12 0xb6dc2f63 in camel_data_wrapper_construct_from_stream 
 (data_wrapper=0x8254570, stream=0x826ab10) at camel-data-wrapper.c:270
 #13 0xb60fbd97 in maildir_get_message (folder=0x80def28, uid=0x8269dd0 
 1192085835.11467_1.evergrey, 
 
 
 [EMAIL PROTECTED]:~/Current/mailtests/md/spam1/cur$ ls -alh 
 1192085835.11467_1.evergrey\!2\,SH 
 -rw-r--r-- 1 pvanhoof pvanhoof 401M 2008-01-27 13:28 
 1192085835.11467_1.evergrey!2,SH
 [EMAIL PROTECTED]:~/Current/mailtests/md/spam1/cur$ 
 
 
 
 On Sat, 2008-01-26 at 23:22 -0500, Jeffrey Stedfast wrote:
  Something like the attached patch might work, tho it is untested.
  
  If this doesn't work, then I suspect the problem is that the seek
  position might get changed out from under the mime parser (assuming it
  is using either a CamelStreamFs or an fd).
  
  Note that camel_stream_fs_new_with_fd[_and_bounds]() calls lseek() on
  the fd passed in.
  
  From the dup() man page:
  
 After  a  successful  return from dup() or dup2(), the old and new 
  file
 descriptors may be used interchangeably. 

Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-27 Thread Philip Van Hoof
Looks like the GByteArray is still being created.

(gdb) break camel-mime-part-utils.c:82
Breakpoint 2 at 0xb6dd541e: file camel-mime-part-utils.c, line 82.
(gdb) delete 1
(gdb) cont
Continuing.

Breakpoint 2, simple_data_wrapper_construct_from_parser (dw=0xb3f02800, 
mp=0x827bcd0) at camel-mime-part-utils.c:82
82  if (buffer != NULL) {
(gdb) print buffer
$1 = (GByteArray *) 0x80e4dc0
(gdb) 


Breakpoint 1, camel_mime_parser_step (parser=0x827bcd0, databuffer=0xb4882f3c, 
datalength=0xb4882f40) at camel-mime-parser.c:610
610 struct _header_scan_state *s = _PRIVATE (parser);
(gdb) bt
#0  camel_mime_parser_step (parser=0x827bcd0, databuffer=0xb4882f3c, 
datalength=0xb4882f40) at camel-mime-parser.c:610
#1  0xb6dd5456 in simple_data_wrapper_construct_from_parser (dw=0xb3f02800, 
mp=0x827bcd0) at camel-mime-part-utils.c:81
#2  0xb6dd55e9 in camel_mime_part_construct_content_from_parser (dw=0x824e8c8, 
mp=0x827bcd0) at camel-mime-part-utils.c:127
#3  0xb6dd6ff1 in construct_from_parser (mime_part=0x824e8c8, mp=0x827bcd0) at 
camel-mime-part.c:968
#4  0xb6dd70af in camel_mime_part_construct_from_parser (mime_part=0x824e8c8, 
mp=0x827bcd0) at camel-mime-part.c:996
#5  0xb6de1aab in construct_from_parser (multipart=0x8246f80, mp=0x827bcd0) at 
camel-multipart.c:577
#6  0xb6de1bea in camel_multipart_construct_from_parser (multipart=0x8246f80, 
mp=0x827bcd0) at camel-multipart.c:609
#7  0xb6dd5681 in camel_mime_part_construct_content_from_parser (dw=0x8254570, 
mp=0x827bcd0) at camel-mime-part-utils.c:144
#8  0xb6dd6ff1 in construct_from_parser (mime_part=0x8254570, mp=0x827bcd0) at 
camel-mime-part.c:968
#9  0xb6dd1de4 in construct_from_parser (dw=0x8254570, mp=0x827bcd0) at 
camel-mime-message.c:597
#10 0xb6dd70af in camel_mime_part_construct_from_parser (mime_part=0x8254570, 
mp=0x827bcd0) at camel-mime-part.c:996
#11 0xb6dd7122 in construct_from_stream (dw=0x8254570, s=0x826ab10) at 
camel-mime-part.c:1012
#12 0xb6dc2f63 in camel_data_wrapper_construct_from_stream 
(data_wrapper=0x8254570, stream=0x826ab10) at camel-data-wrapper.c:270
#13 0xb60fbd97 in maildir_get_message (folder=0x80def28, uid=0x8269dd0 
1192085835.11467_1.evergrey, 


[EMAIL PROTECTED]:~/Current/mailtests/md/spam1/cur$ ls -alh 
1192085835.11467_1.evergrey\!2\,SH 
-rw-r--r-- 1 pvanhoof pvanhoof 401M 2008-01-27 13:28 
1192085835.11467_1.evergrey!2,SH
[EMAIL PROTECTED]:~/Current/mailtests/md/spam1/cur$ 



On Sat, 2008-01-26 at 23:22 -0500, Jeffrey Stedfast wrote:
 Something like the attached patch might work, tho it is untested.
 
 If this doesn't work, then I suspect the problem is that the seek
 position might get changed out from under the mime parser (assuming it
 is using either a CamelStreamFs or an fd).
 
 Note that camel_stream_fs_new_with_fd[_and_bounds]() calls lseek() on
 the fd passed in.
 
 From the dup() man page:
 
After  a  successful  return from dup() or dup2(), the old and new file
descriptors may be used interchangeably.  They refer to the  same  open
file description (see open(2)) and thus share file offset and file sta‐
tus flags; for example,  if  the  file  offset  is  modified  by  using
lseek(2)  on one of the descriptors, the offset is also changed for the
other.
 
 So my guess is that this will break the parser :(
 
 It might break in the stream case as well, you'd have to follow the code
 paths a bit to know for sure. For instance, even if creating the
 seekable substream doesn't perform an underlying seek on the original
 stream, setting it in a data wrapper might call camel_stream_reset()
 which /might/ do an lseek() on the source fs stream.
 
 Not an insurmountable problem to solve, but it does make things a little
 more difficult and possibly touchy.
 
 Jeff
 
 
 
 On Sat, 2008-01-26 at 22:48 -0500, Jeffrey Stedfast wrote:
  On Sat, 2008-01-26 at 22:12 -0500, Jeffrey Stedfast wrote:
   On Sat, 2008-01-26 at 13:44 +0100, Philip Van Hoof wrote:
This is what happens if you try to open a truly large E-mail on a device
that has not as much memory available:

Is there something we can do about this? Can we change the MIME parsing
algorithm to be less memory demanding for example?

Note that GArray is not really very sparse with memory once you start
having a really large array. Perhaps we can in stead change this to a
normal pointer array of a fixed size (do we know the size before we
start parsing, so that we can allocate an exact size in stead, perhaps?)
   
   eh, why would you change it to a GPtrArray? It doesn't hold pointers, it
   holds message part content.
   
   Unfortunately we don't know the size ahead of time.
   
   I suppose you could use a custom byte array allocator so that you can
   force it to grow by larger chunks or something, dunno.
  
  
   The way GMime handles this is by not loading content into RAM, but 
   that may be harder to do with Camel, especially in the 

Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-27 Thread Jeffrey Stedfast

On Sun, 2008-01-27 at 13:44 +0100, Philip Van Hoof wrote:
 This is very strange, though. It looks like stream=0x0 but the
 mime-parser's stream ain't NULL.

that just means the stream the parser is using is not a subclass of
CamelSeekableSubstream

Jeff


___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-27 Thread Philip Van Hoof

On Sun, 2008-01-27 at 11:27 -0500, Jeffrey Stedfast wrote:
 On Sun, 2008-01-27 at 13:44 +0100, Philip Van Hoof wrote:
  This is very strange, though. It looks like stream=0x0 but the
  mime-parser's stream ain't NULL.
 
 that just means the stream the parser is using is not a subclass of
 CamelSeekableSubstream

The parser was parsing one that was opened by the maildir implementation
of camel_folder_get_message. So it's a file in a maildir, so I guess we
can make a stream for that file inherit CamelSeekableSubstream, right?


-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-26 Thread Jeffrey Stedfast
On Sat, 2008-01-26 at 13:44 +0100, Philip Van Hoof wrote:
 This is what happens if you try to open a truly large E-mail on a device
 that has not as much memory available:
 
 Is there something we can do about this? Can we change the MIME parsing
 algorithm to be less memory demanding for example?
 
 Note that GArray is not really very sparse with memory once you start
 having a really large array. Perhaps we can in stead change this to a
 normal pointer array of a fixed size (do we know the size before we
 start parsing, so that we can allocate an exact size in stead, perhaps?)

eh, why would you change it to a GPtrArray? It doesn't hold pointers, it
holds message part content.

Unfortunately we don't know the size ahead of time.

I suppose you could use a custom byte array allocator so that you can
force it to grow by larger chunks or something, dunno.


The way GMime handles this is by not loading content into RAM, but that
may be harder to do with Camel, especially in the mbox case.


Jeff


___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-26 Thread Jeffrey Stedfast
On Sat, 2008-01-26 at 22:12 -0500, Jeffrey Stedfast wrote:
 On Sat, 2008-01-26 at 13:44 +0100, Philip Van Hoof wrote:
  This is what happens if you try to open a truly large E-mail on a device
  that has not as much memory available:
  
  Is there something we can do about this? Can we change the MIME parsing
  algorithm to be less memory demanding for example?
  
  Note that GArray is not really very sparse with memory once you start
  having a really large array. Perhaps we can in stead change this to a
  normal pointer array of a fixed size (do we know the size before we
  start parsing, so that we can allocate an exact size in stead, perhaps?)
 
 eh, why would you change it to a GPtrArray? It doesn't hold pointers, it
 holds message part content.
 
 Unfortunately we don't know the size ahead of time.
 
 I suppose you could use a custom byte array allocator so that you can
 force it to grow by larger chunks or something, dunno.


 The way GMime handles this is by not loading content into RAM, but 
 that may be harder to do with Camel, especially in the mbox case.

er, I should probably explain this:

- writing the code should be relatively easy to do, but in the mbox
case, the mbox may end up getting expunged or rewritten for some other
reason which may cause problems, not sure how that would work.

I think in Maildir, as long as the fd remains open, the file won't
actually disappear after an unlink() until the fd gets closed, so that
might work out ok assuming you can spare the fd (which might be the
other problem with Evolution?).

Jeff


___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Loading really large E-mails on devices with not enough Vm

2008-01-26 Thread Jeffrey Stedfast
Something like the attached patch might work, tho it is untested.

If this doesn't work, then I suspect the problem is that the seek
position might get changed out from under the mime parser (assuming it
is using either a CamelStreamFs or an fd).

Note that camel_stream_fs_new_with_fd[_and_bounds]() calls lseek() on
the fd passed in.

From the dup() man page:

   After  a  successful  return from dup() or dup2(), the old and new file
   descriptors may be used interchangeably.  They refer to the  same  open
   file description (see open(2)) and thus share file offset and file sta‐
   tus flags; for example,  if  the  file  offset  is  modified  by  using
   lseek(2)  on one of the descriptors, the offset is also changed for the
   other.

So my guess is that this will break the parser :(

It might break in the stream case as well, you'd have to follow the code
paths a bit to know for sure. For instance, even if creating the
seekable substream doesn't perform an underlying seek on the original
stream, setting it in a data wrapper might call camel_stream_reset()
which /might/ do an lseek() on the source fs stream.

Not an insurmountable problem to solve, but it does make things a little
more difficult and possibly touchy.

Jeff



On Sat, 2008-01-26 at 22:48 -0500, Jeffrey Stedfast wrote:
 On Sat, 2008-01-26 at 22:12 -0500, Jeffrey Stedfast wrote:
  On Sat, 2008-01-26 at 13:44 +0100, Philip Van Hoof wrote:
   This is what happens if you try to open a truly large E-mail on a device
   that has not as much memory available:
   
   Is there something we can do about this? Can we change the MIME parsing
   algorithm to be less memory demanding for example?
   
   Note that GArray is not really very sparse with memory once you start
   having a really large array. Perhaps we can in stead change this to a
   normal pointer array of a fixed size (do we know the size before we
   start parsing, so that we can allocate an exact size in stead, perhaps?)
  
  eh, why would you change it to a GPtrArray? It doesn't hold pointers, it
  holds message part content.
  
  Unfortunately we don't know the size ahead of time.
  
  I suppose you could use a custom byte array allocator so that you can
  force it to grow by larger chunks or something, dunno.
 
 
  The way GMime handles this is by not loading content into RAM, but 
  that may be harder to do with Camel, especially in the mbox case.
 
 er, I should probably explain this:
 
 - writing the code should be relatively easy to do, but in the mbox
 case, the mbox may end up getting expunged or rewritten for some other
 reason which may cause problems, not sure how that would work.
 
 I think in Maildir, as long as the fd remains open, the file won't
 actually disappear after an unlink() until the fd gets closed, so that
 might work out ok assuming you can spare the fd (which might be the
 other problem with Evolution?).
 
 Jeff
 
 
 ___
 Evolution-hackers mailing list
 Evolution-hackers@gnome.org
 http://mail.gnome.org/mailman/listinfo/evolution-hackers
Index: ChangeLog
===
--- ChangeLog	(revision 8425)
+++ ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2008-01-26  Jeffrey Stedfast  [EMAIL PROTECTED]
+
+	* camel-mime-part-utils.c (simple_data_wrapper_construct_from_parser):
+	If possible, keep the content on disk.
+
 2008-01-24  Matthew Barnes  [EMAIL PROTECTED]
 
 	* camel-object.c (camel_object_cast):
Index: camel-mime-part-utils.c
===
--- camel-mime-part-utils.c	(revision 8425)
+++ camel-mime-part-utils.c	(working copy)
@@ -57,25 +57,47 @@
 static void
 simple_data_wrapper_construct_from_parser (CamelDataWrapper *dw, CamelMimeParser *mp)
 {
+	GByteArray *buffer = NULL;
+	CamelStream *stream;
+	off_t start, end;
+	int fd = -1;
+	size_t len;
 	char *buf;
-	GByteArray *buffer;
-	CamelStream *mem;
-	size_t len;
-
+	
 	d(printf (simple_data_wrapper_construct_from_parser()\n));
-
-	/* read in the entire content */
-	buffer = g_byte_array_new ();
+	
+	if (!(stream = camel_mime_parser_stream (mp)))
+		fd = camel_mime_parser_fd (mp);
+	else if (!CAMEL_IS_SEEKABLE_SUBSTREAM (stream))
+		stream = NULL;
+	
+	if ((stream || fd != -1)  (start = camel_mime_parser_tell (mp)) != -1) {
+		/* we can keep content on disk */
+	} else {
+		/* need to load content into memory */
+		buffer = g_byte_array_new ();
+	}
+	
 	while (camel_mime_parser_step (mp, buf, len) != CAMEL_MIME_PARSER_STATE_BODY_END) {
-		d(printf(appending o/p data: %d: %.*s\n, len, len, buf));
-		g_byte_array_append (buffer, (guint8 *) buf, len);
+		if (buffer != NULL) {
+			d(printf(appending o/p data: %d: %.*s\n, len, len, buf));
+			g_byte_array_append (buffer, (guint8 *) buf, len);
+		}
 	}
-
-	d(printf(message part kept in memory!\n));
-
-	mem = camel_stream_mem_new_with_byte_array (buffer);
-