Bug#395268: hang in Mail::Mbox::MessageParser::Grep

2007-01-10 Thread David Coppit

On Tue, 9 Jan 2007, Joey Hess wrote:


David, it seems that there's a bug in the Grep implementation of the
MessageParser that can lead to a hang. See discussion at
http://bugs.debian.org/395268


Thanks for the heads up.


I share Steinar's confusion about what $self-{'CURRENT_EMAIL_INDEX'}
should be used for and how it relates to $self-{'email_number'}


I renamed CURRENT_EMAIL_INDEX TO CHUNK_INDEX, and I've added the following
documentation:

  # Reading grep data provides us with an array of potential email
  # starting locations. However, due to included emails and attachments,
  # we have to validate these locations as actually being the start of
  # emails. As a result, there may be more chunks in the array than
  # emails. So CHUNK_INDEX = email_number-1.

I've found that when the grep implementation goes into an infinite loop,
it's because the grep data does not match the file, as would be the case
if the file was modified after grep was run. My next release will detect
this case and try to recover.


As a temporary workaround, I've disabled the Grep implementation in the
Debian package.


I'll ping you when the release comes out so that you can test it. (I'm not
sure how to recreate the bug myself.)

Comments on one of the emails are below.

BTW, I see from the link you provided that this is marked as closed. Did
1.4005 fix the bug or not?

David

On Sun, 12 Nov 2006 03:29:04 +0100, Steinar H. Gunderson wrote:


If I had to guess, I'd assume $self-{'email_number'} was somehow a
_logical_ message number, and thus unfit for any sort of indexing.
That's a guess, though.


That's right. CURRENT_EMAIL_INDEX (now CHUNK_INDEX) refers to an entry in
the grep data array that corresponds to some block of text in the file
that begins From  In the case that this block of text may not be the
start of a new email, we will need to continue incrementing CHUNK_INDEX
and reading more chunks.


Part of the reason seems to be that _adjust_cache_data() somehow merges
or deletes messages without adjusting email_number; I'm not really sure
what it is supposed to do.


At the end, after validating the start of the next email, I add up the
chunk entries to get the final, validated entry for the email.

As for not checking the result of read(), that was sloppy programming on
my part. I thought there was no way for the grep data and the file to get
out of sync, but apparently someone has found a way. :) I don't know what
the cause is in this case, but I'll try to detect and avoid/correct it in
the next release.

_
David Coppit   [EMAIL PROTECTED]
The College of William and Maryhttp://coppit.org/

When the president does it that means that it is not illegal.
- Richard Nixon on domestic surveillance, 5/19/1977
Do I have the legal authority to do this? And the answer is, absolutely.
- George W. Bush on domestic surveillance, 12/19/2005


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#395268: hang in Mail::Mbox::MessageParser::Grep

2007-01-10 Thread Joey Hess
David Coppit wrote:
 I've found that when the grep implementation goes into an infinite loop,
 it's because the grep data does not match the file, as would be the case
 if the file was modified after grep was run. My next release will detect
 this case and try to recover.

I don't think this happens when running the Mail::MboxParser test suite,
which will reproduce the hang.

 BTW, I see from the link you provided that this is marked as closed. Did
 1.4005 fix the bug or not?

No, I fixed it by disabling grep.

-- 
see shy jo


signature.asc
Description: Digital signature


Bug#395268: hang in Mail::Mbox::MessageParser::Grep

2007-01-10 Thread David Coppit

On Tue, 9 Jan 2007, Joey Hess wrote:


David, it seems that there's a bug in the Grep implementation of the
MessageParser that can lead to a hang. See discussion at
http://bugs.debian.org/395268


I've found and fixed the problem. The issue was that Tassilo's test case
assumed that read_next_email would return some false value, when in fact
you are not supposed to call the method if end_of_file is true. i.e. he
did:

  while(my $email = $folder_reader-read_next_email()) {
print $output $$email;
  }

instead of:

  while(!$folder_reader-end_of_file()) {
my $email = $folder_reader-read_next_email();
print $output $$email;
  }

His way seems reasonable, so I added (back in?) support for
it---read_next_email now returns undef on EOF. I'll be releasing 1.5000
very soon.

David

P.S. Please CC me on bug reports as soon as my module is obviously
involved. I probably could have saved several people some debugging
effort. (I've thanked them all in my changelog.)

_
David Coppit   [EMAIL PROTECTED]
The College of William and Maryhttp://coppit.org/

When the president does it that means that it is not illegal.
- Richard Nixon on domestic surveillance, 5/19/1977
Do I have the legal authority to do this? And the answer is, absolutely.
- George W. Bush on domestic surveillance, 12/19/2005


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]