Bug#483247: Updating grepmail
For what it's worth, I'm in the process of updating grepmail. I don't have ready access to the full CPAN-Testers test matrix, so I can't guarantee that all tests will be passing everywhere. But the obvious failures will be fixed, plus a few bug fixes as well as support for lzip and xz added.
Bug#432083: Fixed in grepmail versions 5.3034
Thanks for the bug report. I've fixed it and will push out a new release later today. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#234795: Need more information
Hi there, I need more information to debug this. Please either confirm the bug and provide more information, or mark this bug as not a bug. grepmail uses Mail::Mbox::MessageParser, which is designed to use memory proportional to the largest email message in a mailbox. I verified that it does indeed operate this way, using a 54MB mailbox: mbox size: 56683943 max email size:11182857 max read buffer: 11184795 -- Biggest size of M::M::MP's read buffer folder_reader: 11186558 -- Biggest size of the M::M::MP Perl object Some stats from ps(1): Plain text mailbox: min real memory: 4976640 min virtual memory: 618000384 max real memory: 38674432 max virtual memory: 651546624 Gzip compressed: min real memory: 5005312 min virtual memory: 618016768 max real memory: 38694912 max virtual memory: 651563008 I also tried a 540MB mailbox, created by concatenating the mailbox 10 times: Plain text x10: min real memory: 4976640 min virtual memory: 618000384 max real memory: 40292352 max virtual memory: 652021760 Gzip compressed x10: min real memory: 5005312 min virtual memory: 618016768 max real memory: 40284160 max virtual memory: 652038144 The numbers above were basically the same for a 23KB mailbox. Also note that this command: perl -e 'system ps -o rss,vsz $$' consumes 1175552 real and 615645184 virtual memory, so the numbers above are not out of the ordinary. If you could run the attached anonymize_mailbox script on your mailbox, verify that memory usage is still bad, then send the mailbox to me, I can debug this better. Another idea: perhaps your mailbox is malformed, such that grepmail only sees 1 email in the whole mailbox. You can check this by running: grepmail -r . my_big_mailbox If you want to confirm that you have a very large email in your mailbox, find this line in grepmail: my $email = $folder_reader-read_next_email(); and follow it with this line: print length($$email) . \n; then run something like: grepmail nonexistent_pattern my_big_mailbox | sort -n Regards, David _ David Coppit http://coppit.org/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#254045: -d bug: not a bug?
I believe this is not a bug. I suspect you entered a unicode character that looks like - but is not. Getopt::Std fails to get options unless the option dash is exactly Here's a program that you can use to test it. use Getopt::Std; use Data::Dumper; ($c) = $ARGV[0] =~ /^(.)/; print Character $c is ord( . ord($c) . )\n; getopt('d',\%new_opts); print Dumper \%new_opts; When I run the program, I get: $ perl a -d 'before 6/1/04' Character - is ord(45) $VAR1 = { 'd' = 'before 6/1/04' }; But when I copy and paste - from the website for your bug report I get: $ perl a ???d 'before 6/1/04' Character ? is ord(226) $VAR1 = {}; Please confirm and either provide more information or close the bug as not a bug. Thanks, David _ David Coppit http://coppit.org/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#234795: Need more information
Forgot the anonymize_mailbox script. On Sun, 23 Aug 2009, David Coppit wrote: Hi there, I need more information to debug this. Please either confirm the bug and provide more information, or mark this bug as not a bug. grepmail uses Mail::Mbox::MessageParser, which is designed to use memory proportional to the largest email message in a mailbox. I verified that it does indeed operate this way, using a 54MB mailbox: mbox size: 56683943 max email size:11182857 max read buffer: 11184795 -- Biggest size of M::M::MP's read buffer folder_reader: 11186558 -- Biggest size of the M::M::MP Perl object Some stats from ps(1): Plain text mailbox: min real memory: 4976640 min virtual memory: 618000384 max real memory: 38674432 max virtual memory: 651546624 Gzip compressed: min real memory: 5005312 min virtual memory: 618016768 max real memory: 38694912 max virtual memory: 651563008 I also tried a 540MB mailbox, created by concatenating the mailbox 10 times: Plain text x10: min real memory: 4976640 min virtual memory: 618000384 max real memory: 40292352 max virtual memory: 652021760 Gzip compressed x10: min real memory: 5005312 min virtual memory: 618016768 max real memory: 40284160 max virtual memory: 652038144 The numbers above were basically the same for a 23KB mailbox. Also note that this command: perl -e 'system ps -o rss,vsz $$' consumes 1175552 real and 615645184 virtual memory, so the numbers above are not out of the ordinary. If you could run the attached anonymize_mailbox script on your mailbox, verify that memory usage is still bad, then send the mailbox to me, I can debug this better. Another idea: perhaps your mailbox is malformed, such that grepmail only sees 1 email in the whole mailbox. You can check this by running: grepmail -r . my_big_mailbox If you want to confirm that you have a very large email in your mailbox, find this line in grepmail: my $email = $folder_reader-read_next_email(); and follow it with this line: print length($$email) . \n; then run something like: grepmail nonexistent_pattern my_big_mailbox | sort -n Regards, David _ David Coppit http://coppit.org/ _ David Coppit http://coppit.org/#!/usr/bin/perl -w $VERSION = '1.00'; use strict; use FileHandle; #--- my $LINE = 0; my $FILE_HANDLE = undef; my $START = 0; my $END = 0; my $READ_BUFFER = ''; sub reset_file { my $file_handle = shift; $FILE_HANDLE = $file_handle; $LINE = 1; $START = 0; $END = 0; $READ_BUFFER = ''; } #--- # Need this for a lookahead. my $READ_CHUNK_SIZE = 0; sub read_email { # Undefined read buffer means we hit eof on the last read. return 0 unless defined $READ_BUFFER; my $line = $LINE; $START = $END; # Look for the start of the next email LOOK_FOR_NEXT_HEADER: while($READ_BUFFER =~ m/^(From\s.*\d:\d+:\d.* \d{4})/mg) { $END = pos($READ_BUFFER) - length($1); # Don't stop on email header for the first email in the buffer next if $END == 0; # Keep looking if the header we found is part of a Begin Included # Message. my $end_of_string = substr($READ_BUFFER, $END-200, 200); if ($end_of_string =~ /\n-( Begin Included Message |Original Message)-\n[^\n]*\n*$/i) { next; } # Found the next email! my $email = substr($READ_BUFFER, $START, $END-$START); $LINE += ($email =~ tr/\n//); return (1, $email, $line); } # Didn't find next email in current buffer. Most likely we need to read some # more of the mailbox. Shift the current email to the front of the buffer # unless we've already done so. $READ_BUFFER = substr($READ_BUFFER,$START) unless $START == 0; $START = 0; # Start looking at the end of the buffer, but back up some in case the edge # of the newly read buffer contains the start of a new header. I believe the # RFC says header lines can be at most 90 characters long. my $search_position = length($READ_BUFFER) - 90; $search_position = 0 if $search_position 0; # Can't use sysread because it doesn't work with ungetc if ($READ_CHUNK_SIZE == 0) { local $/ = undef; if (eof $FILE_HANDLE) { my $email = $READ_BUFFER; undef $READ_BUFFER; return (1, $email, $line); } else { $READ_BUFFER = $FILE_HANDLE; pos($READ_BUFFER) = $search_position; goto LOOK_FOR_NEXT_HEADER; } } else { if (read($FILE_HANDLE, $READ_BUFFER, $READ_CHUNK_SIZE, length($READ_BUFFER))) { pos($READ_BUFFER) = $search_position; goto LOOK_FOR_NEXT_HEADER
Bug#395268: 1.5000
On Sun, 14 Jan 2007, Joey Hess wrote: I tried out 1.5000. I'm still seeing apparently the same hang with it while building Mail::MboxParser.. Hi Tassilo, It looks like changes to my module Mail::Mbox::MessageParser are causing your module Mail::MboxParser to hang during make test. I debugged the issue, and it appears to be a problem with the way that you change the file position while testing the newline type in the file, as well as another unnecessary seek in next_message_new(). Attached is a patch that fixes the problem(s). Regards, David P.S. Joey, there are still two warnings issued during the make test step of Tassilo's module. These are coming from my module, and will be fixed in versions 1.5000. _ David Coppit [EMAIL PROTECTED] The College of William and Maryhttp://coppit.org/ When the president does it that means that it is not illegal. - Richard Nixon on domestic surveillance, 5/19/1977 Do I have the legal authority to do this? And the answer is, absolutely. - George W. Bush on domestic surveillance, 12/19/2005--- MboxParser.pm 2005-12-08 05:15:39.0 -0500 +++ /Users/coppit/Desktop/MboxParser.pm 2007-04-23 13:54:45.0 -0400 @@ -519,7 +519,6 @@ return undef if ref(\$p) eq 'SCALAR' or $p-end_of_file; -seek $self-{READER}, $self-{CURR_POS}, SEEK_SET; my $nl = $self-{NL}; my $mailref = $p-read_next_email; my ($header, $body) = split /$nl$nl/, $$mailref, 2; @@ -794,6 +793,7 @@ my $h = $self-{READER}; my $newline; + my $old_position = tell $h; seek $h, 0, SEEK_SET; while (sysread $h, (my $c), 1) { if (ord($c) == 13) { @@ -807,6 +807,7 @@ last; } } + seek($h, $old_position, 0); return $newline; }
Bug#395268: hang in Mail::Mbox::MessageParser::Grep
On Tue, 9 Jan 2007, Joey Hess wrote: David, it seems that there's a bug in the Grep implementation of the MessageParser that can lead to a hang. See discussion at http://bugs.debian.org/395268 Thanks for the heads up. I share Steinar's confusion about what $self-{'CURRENT_EMAIL_INDEX'} should be used for and how it relates to $self-{'email_number'} I renamed CURRENT_EMAIL_INDEX TO CHUNK_INDEX, and I've added the following documentation: # Reading grep data provides us with an array of potential email # starting locations. However, due to included emails and attachments, # we have to validate these locations as actually being the start of # emails. As a result, there may be more chunks in the array than # emails. So CHUNK_INDEX = email_number-1. I've found that when the grep implementation goes into an infinite loop, it's because the grep data does not match the file, as would be the case if the file was modified after grep was run. My next release will detect this case and try to recover. As a temporary workaround, I've disabled the Grep implementation in the Debian package. I'll ping you when the release comes out so that you can test it. (I'm not sure how to recreate the bug myself.) Comments on one of the emails are below. BTW, I see from the link you provided that this is marked as closed. Did 1.4005 fix the bug or not? David On Sun, 12 Nov 2006 03:29:04 +0100, Steinar H. Gunderson wrote: If I had to guess, I'd assume $self-{'email_number'} was somehow a _logical_ message number, and thus unfit for any sort of indexing. That's a guess, though. That's right. CURRENT_EMAIL_INDEX (now CHUNK_INDEX) refers to an entry in the grep data array that corresponds to some block of text in the file that begins From In the case that this block of text may not be the start of a new email, we will need to continue incrementing CHUNK_INDEX and reading more chunks. Part of the reason seems to be that _adjust_cache_data() somehow merges or deletes messages without adjusting email_number; I'm not really sure what it is supposed to do. At the end, after validating the start of the next email, I add up the chunk entries to get the final, validated entry for the email. As for not checking the result of read(), that was sloppy programming on my part. I thought there was no way for the grep data and the file to get out of sync, but apparently someone has found a way. :) I don't know what the cause is in this case, but I'll try to detect and avoid/correct it in the next release. _ David Coppit [EMAIL PROTECTED] The College of William and Maryhttp://coppit.org/ When the president does it that means that it is not illegal. - Richard Nixon on domestic surveillance, 5/19/1977 Do I have the legal authority to do this? And the answer is, absolutely. - George W. Bush on domestic surveillance, 12/19/2005 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#395268: hang in Mail::Mbox::MessageParser::Grep
On Tue, 9 Jan 2007, Joey Hess wrote: David, it seems that there's a bug in the Grep implementation of the MessageParser that can lead to a hang. See discussion at http://bugs.debian.org/395268 I've found and fixed the problem. The issue was that Tassilo's test case assumed that read_next_email would return some false value, when in fact you are not supposed to call the method if end_of_file is true. i.e. he did: while(my $email = $folder_reader-read_next_email()) { print $output $$email; } instead of: while(!$folder_reader-end_of_file()) { my $email = $folder_reader-read_next_email(); print $output $$email; } His way seems reasonable, so I added (back in?) support for it---read_next_email now returns undef on EOF. I'll be releasing 1.5000 very soon. David P.S. Please CC me on bug reports as soon as my module is obviously involved. I probably could have saved several people some debugging effort. (I've thanked them all in my changelog.) _ David Coppit [EMAIL PROTECTED] The College of William and Maryhttp://coppit.org/ When the president does it that means that it is not illegal. - Richard Nixon on domestic surveillance, 5/19/1977 Do I have the legal authority to do this? And the answer is, absolutely. - George W. Bush on domestic surveillance, 12/19/2005 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#365151: libmail-mbox-messageparser-perl: message splitting breaks
On Wed, 21 Jun 2006, Volker Kuhlmann wrote: There may be something to look for. I'll forward you an email that describes a problem. I'm hoping someone can send me a sample mailbox. Your hopes can be upheld ;) Attached a sample mailbox, and debug output. It's my spam box (using grepmail is nifty to check for a false positive that has gone missing), so don't read the text too closely. The box contains 5 messages. Search string is @orcon.net.nz, and it occurs in msg 4 and 5, but all msgs from 2 are returned as match. (If the box was 1000 msgs longer, they would all be returned as well.) From my reading of the debug output, Mbox/MessageParser fails to recognise the ^from in msg 1 as being part of the msg body. I can say with certainty that mutt has never failed me in a decade with separating mbox msgs. All my emails for the past 4 years have enforced correct content-length: headers; I don't care what XYZ or DJB says, it works fine. Mbox/MessageParser 1.20 hasn't failed me yet either. Well, the mailbox is not valid. The reason appears to be that antispam.rc has truncated the mailbox in an invalid way. Namely, the multipart boundaries have been ignored, so that the ending for: =_NextPart_000_0008_01C684B1.30F8FE30 Is no longer there. From RFC 1341: The encapsulation boundary following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line In previous versions this ill-formed mailbox was not seen because I was not parsing multi-part emails correctly. In previous versions, if an email was part of the main multi-part email, I would incorrectly break the multi-part email. In this case you *want* me to break the email. I assume that pine and mutt are doing my previous incorrect behavior. (I just checked with pine, and it breaks the email even if I put the ending boundary marker *after* the next email.) What I'll try to do is this: - Look for ending boundary - If a ^From appears before the ending boundary is found, ignore it and consider the email to be a part. - If the ending boundary is not found, consider the mailbox to be ill-formed. Emit a warning, back up, and search for the next ^From . There's a nasty performance hit for ill-formed mailboxes as the parser searches the rest of the file for the missing boundary, but perhaps that will be an incentive for people to fix their mailboxes. :) Eduard, as Joey noted, your mailbox has an invalid boundary as well. My solution above should work for your case too. I'll work on this tonight and email you all when it's fixed. Regards, David _ David Coppit [EMAIL PROTECTED] The College of William and Maryhttp://coppit.org/ Single sanction punishment doesn't work for presidents or cheaters. http://www.coppit.org/blog/archives/119 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#365151: libmail-mbox-messageparser-perl: message splitting breaks
FYI, Today I have released Mail::Mbox::MessageParser version 1.4003, which incorporates Eduard Bloch's patch for the grepmail returning all emails bug. Many thanks to all involved. David _ David Coppit [EMAIL PROTECTED] The College of William and Maryhttp://coppit.org/ ... frothy eloquence neither convinces nor satisfies me. -- 1899, Willard D. Vandiver (D-MO) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#365151: libmail-mbox-messageparser-perl: message splitting breaks
On Sun, 21 May 2006, Eduard Bloch wrote: It was not my patch, kudos to the creator (JoeyH, AFAICS). Sorry. I was confused by the email he sent. I've updated the attribution in the changelog for the next release. David _ David Coppit [EMAIL PROTECTED] The College of William and Maryhttp://coppit.org/ ... frothy eloquence neither convinces nor satisfies me. -- 1899, Willard D. Vandiver (D-MO) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]