Bug#984581: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Control: forwarded -1 https://bugzilla.redhat.com/show_bug.cgi?id=1994178 On Sun, 30 May 2021 10:26:21 +0800 Paul Wise wrote: > On Mon, 5 Apr 2021 06:04:49 + "Surla, Sai Kalyan" wrote: > > > Is there any update on the issues. > > I finally found time to work on the first issue (header detection) > where we had a workaround already and created proper patches (attached) > for the issue and sent them to the upstream maintainer. I have forwarded the patches to the Fedora bug tracker, hopefully that will mean that the upstream maintainer will accept them now. I had to fix a bug with the first patch causing a segfault. I will include the patches in the next upload to Debian unstable. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#984581: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Hi Paul, Thanks for your time on this issue. We will verify the patch that you shared and will let you know the results. Thank you Sai Kalyan _ [cid:arcserve-email-logo_566d469b-c8dc-46eb-909b-300e3f3e47a1.jpg]<https://arcserve.com/> Sai Kalyan Surla | Software Engineer Office: 7993045110 | Mobile: 9182331089 | saikalyan.su...@arcserve.com arcserve.com<https://www.arcserve.com/> | Twitter<https://twitter.com/Arcserve> | LinkedIn<https://www.linkedin.com/company/arcserve/> | YouTube<https://www.youtube.com/user/arcserve> _ If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited. From: Paul Wise Sent: 30 May 2021 07:56 AM To: 984...@bugs.debian.org; Surla, Sai Kalyan Subject: Re: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file On Mon, 5 Apr 2021 06:04:49 + "Surla, Sai Kalyan" wrote: > Is there any update on the issues. I finally found time to work on the first issue (header detection) where we had a workaround already and created proper patches (attached) for the issue and sent them to the upstream maintainer. -- bye, pabs https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise>
Bug#984581: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
On Mon, 5 Apr 2021 06:04:49 + "Surla, Sai Kalyan" wrote: > Is there any update on the issues. I finally found time to work on the first issue (header detection) where we had a workaround already and created proper patches (attached) for the issue and sent them to the upstream maintainer. -- bye, pabs https://wiki.debian.org/PaulWise From a4aa24ae5675b09385d0c88add48c3ab046e699d Mon Sep 17 00:00:00 2001 From: Paul Wise Date: Sun, 30 May 2021 10:02:14 +0800 Subject: [PATCH 1/3] Add debugging for header detection --- src/readpst.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/readpst.c b/src/readpst.c index 6d94f15..b5910e9 100644 --- a/src/readpst.c +++ b/src/readpst.c @@ -1591,6 +1591,8 @@ void write_normal_email(FILE* f_output, char f_name[], pst_item* item, int mode, DEBUG_ENT("write_normal_email"); pst_convert_utf8_null(item, &item->email->header); +DEBUG_INFO(("PST headers\n%s\n", *item->email->header.str)); +DEBUG_INFO(("Extra MIME headers\n%s\n", *extra_mime_headers)); headers = valid_headers(item->email->header.str) ? item->email->header.str : valid_headers(*extra_mime_headers) ? *extra_mime_headers : NULL; -- 2.30.2 From bade93dcdb435bc7bec50cf4b54481731beea45c Mon Sep 17 00:00:00 2001 From: Paul Wise Date: Sun, 30 May 2021 09:49:57 +0800 Subject: [PATCH 2/3] Also detect email headers wrapped with space instead of tab Spaces are commonly used for email header wrapping. --- src/readpst.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/readpst.c b/src/readpst.c index b5910e9..6663771 100644 --- a/src/readpst.c +++ b/src/readpst.c @@ -1275,8 +1275,10 @@ int header_match(char *header, char*field) { if (strncasecmp(header, field, n) == 0) return 1; // tag:{space} if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) { char *crlftab = "\r\n\t"; +char *crlfspc = "\r\n "; DEBUG_INFO(("Possible wrapped header = %s\n", header)); if (strncasecmp(header+n-1, crlftab, 3) == 0) return 1; // tag:{cr}{lf}{tab} +if (strncasecmp(header+n-1, crlfspc, 3) == 0) return 1; // tag:{cr}{lf}{space} } return 0; } -- 2.30.2 From da5f159caa66db380b793f9062a36888c9b12467 Mon Sep 17 00:00:00 2001 From: Paul Wise Date: Sun, 30 May 2021 09:51:26 +0800 Subject: [PATCH 3/3] Detect reasonable email headers too RFC 5322 specifies the syntax of email headers, most header fields are more restricted though so use a restricted check in case the headers are bogus parts of the body that happen to match RFC 5322. Fixes: https://bugs.debian.org/984581 --- src/readpst.c | 60 +++ 1 file changed, 60 insertions(+) diff --git a/src/readpst.c b/src/readpst.c index 6663771..97ba127 100644 --- a/src/readpst.c +++ b/src/readpst.c @@ -1283,6 +1283,65 @@ int header_match(char *header, char*field) { return 0; } +// https://en.wikipedia.org/wiki/Email#Message_header +// https://www.rfc-editor.org/rfc/rfc5322.html +// https://www.iana.org/assignments/message-headers/message-headers.xhtml +int header_is_reasonable(char *header) +{ +char *c; +#define C *c + +// The header must not be NULL +if (header) c = header; +else return 0; + +// usually the header field name starts with upper-case: A-Z +if (C >= 'A' && C <= 'Z') c++; +else return 0; + +while(1) { +// most header field names use a limited set of characters: - 0-9 A-Z a-z +if ( +(C >= 'A' && C <= 'Z') || +(C >= 'a' && C <= 'z') || +(C >= '0' && C <= '9') || +(C == '-') + ) { +c++; +// the header field name is then terminated with a colon +} else if (C == ':') { + c++; + goto parse_header_field_value; +// other characters are an indicator of an invalid header +} else { + return 0; +} +} + +parse_header_field_value: +while(1) { +// header field values are printable US-ASCII plus space/tab +if ( +(C >= 33 && C <= 126) || +(C == ' ' || C == '\t') + ) { +c++; +// the header field value is then terminated with CRLF +} else if (C == '\r' && *(c+1) == '\n') { +c += 2; +// the value could continue to the next line though +if (C == ' ' || C == '\t') c++; +else return 1; +// other characters are an indicator of an invalid header +} else { + return 0; +} +} + +#undef C + +} + int valid_headers(char *header) { // headers are sometimes really bogus - they seem to be fragments of the @@ -1303,6 +1362,7 @@ int valid_headers(char *header) if (header_match(header, "X-ASG-Debug-ID: " )) return 1; if (header_match(header, "X-Barracuda-URL: " )) return 1; if (header_matc
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
On Mon, 2021-04-05 at 06:04 +, Surla, Sai Kalyan wrote: > Is there any update on the issues. I discussed the issues with upstream. Upstream doesn't have time to work on the issues. Upstream confirmed my suggested solutions sound OK. I haven't yet had time to work on the solutions. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Hi Paul, Hope you are doing good. Is there any update on the issues. Thank you Sai Kalyan From: Paul Wise Sent: 22 March 2021 12:56 PM To: Surla, Sai Kalyan ; 984...@bugs.debian.org Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file On Mon, 2021-03-22 at 05:41 +, Surla, Sai Kalyan wrote: > In this case can we still go with the temporary change that you > suggested as the issue is little different with this PST? The temporary change will not work for the second PST, since it only works around the header detection issue, but the second PST doesn't have the full MIME headers, only the predefined PST To/CC/BCC fields. There isn't any easy workaround for the issue with the second PST. -- bye, pabs https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise>
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
On Mon, 2021-03-22 at 05:41 +, Surla, Sai Kalyan wrote: > In this case can we still go with the temporary change that you > suggested as the issue is little different with this PST? The temporary change will not work for the second PST, since it only works around the header detection issue, but the second PST doesn't have the full MIME headers, only the predefined PST To/CC/BCC fields. There isn't any easy workaround for the issue with the second PST. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Hi Paul, In this case can we still go with the temporary change that you suggested as the issue is little different with this PST? Thank you Sai Kalyan From: Paul Wise Sent: 19 March 2021 08:26 AM To: Surla, Sai Kalyan ; 984...@bugs.debian.org Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file On Fri, 2021-03-19 at 09:03 +0800, Paul Wise wrote: > The specs indicate that 0x39fe is indeed the recipient address: The issue in libpst when there are no MIME headers in the PST file is: There are some MAPI properties for To/CC/BCC: https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property> https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property> https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property> These contain *only* the names and not the addresses. Outlook fills them automatically from the list of recipients. Outlook stores the recipients in a separate table to email properties. libpst stores them in the sentto/cc/bcc fields of the email structure. libpst has no storage of the recipients table of the PST file. libpst processes the MAPI types one-by-one rather than in separate tables and only has one action per MAPI type. So this is not going to be easy to fix. I will discuss this with upstream. -- bye, pabs https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise>
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
On Fri, 2021-03-19 at 09:03 +0800, Paul Wise wrote: > The specs indicate that 0x39fe is indeed the recipient address: The issue in libpst when there are no MIME headers in the PST file is: There are some MAPI properties for To/CC/BCC: https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property These contain *only* the names and not the addresses. Outlook fills them automatically from the list of recipients. Outlook stores the recipients in a separate table to email properties. libpst stores them in the sentto/cc/bcc fields of the email structure. libpst has no storage of the recipients table of the PST file. libpst processes the MAPI types one-by-one rather than in separate tables and only has one action per MAPI type. So this is not going to be easy to fix. I will discuss this with upstream. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
On Fri, 2021-03-19 at 08:30 +0800, Paul Wise wrote: > I noticed something in common between the original PST file and the > new PST file you have sent, they both have an unknown MAPI type > 0x39fe that contains the email addresses of the recipients. So I will > try to find out in the PST file specifications what this MAPI type is > for and then add some code to libpst and readpst to decode it. The specs indicate that 0x39fe is indeed the recipient address: https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-pst/141923d5-15ab-4ef1-a524-6dce75aae546 https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-pst/5ee9a00a-858b-47db-95b3-f91518640ea7 https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagsmtpaddress-canonical-property -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
On Thu, 2021-03-18 at 17:14 +, Surla, Sai Kalyan wrote: > Please find a PST file As far as I can tell from the `readpst -d debug.log` output, this new PST file does not have any MIME headers in it, so it is expected that fixing the valid_headers function will do nothing. I expect if you look at the PST file in Outlook you will see there are no MIME headers. I noticed something in common between the original PST file and the new PST file you have sent, they both have an unknown MAPI type 0x39fe that contains the email addresses of the recipients. So I will try to find out in the PST file specifications what this MAPI type is for and then add some code to libpst and readpst to decode it. $ rm -f * ; /usr/bin/readpst -d debug.log ~/stash/samples/pst/bugs.debian.org/984581/forpst.pst ; echo ; grep -A5 'mapi-id: 0x39fe' debug.log Opening PST file and indexes... Processing Folder "Deleted Items" Processing Folder "for pst" "Outlook Data File" - 2 items done, 0 items skipped. "for pst" - 1 items done, 0 items skipped. 2356166 pst_process libpst.c(2194) #10 - mapi-id: 0x39fe type: 0x1f length: 0x13 2356166 pst_process libpst.c(3172) Unknown type 0x39fe Unicode String Data [size = 0x13] 2356166 pst_process libpst.c(3174) 2356166 00 :64 65 65 70 74 69 73 6b 40 67 6d 61 69 6c 2e 63 :deeptisk@gmail.c 2356166 10 :6f 6d 00 :om. $ rm -f * ; /usr/bin/readpst -d debug.log ~/stash/samples/pst/bugs.debian.org/984581/u3si.pst ; echo ; grep -A5 'mapi-id: 0x39fe' debug.log Opening PST file and indexes... Processing Folder "Deleted Items" Processing Folder "Sent Items" "Outlook Data File" - 2 items done, 0 items skipped. "Sent Items" - 1 items done, 0 items skipped. 2356205 pst_process libpst.c(2194) #13 - mapi-id: 0x39fe type: 0x1f length: 0x16 2356205 pst_process libpst.c(3172) Unknown type 0x39fe Unicode String Data [size = 0x16] 2356205 pst_process libpst.c(3174) 2356205 00 :4d 79 55 73 65 72 31 40 65 78 63 68 31 33 66 61 :MyUser1@exch13fa 2356205 10 :73 2e 6c 6f 63 00 :s.loc. -- 2356205 pst_process libpst.c(2194) #13 - mapi-id: 0x39fe type: 0x1f length: 0x1c 2356205 pst_process libpst.c(3172) Unknown type 0x39fe Unicode String Data [size = 0x1c] 2356205 pst_process libpst.c(3174) 2356205 00 :41 64 6d 69 6e 69 73 74 72 61 74 6f 72 40 65 78 :Administrator@ex 2356205 10 :63 68 31 33 66 61 73 2e 6c 6f 63 00 :ch13fas.loc. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Hi Paul, Thanks for your time on this issue. We will try to provide our inhouse PST as soon as possible. Thank you From: Paul Wise Sent: 15 March 2021 09:47 AM To: Surla, Sai Kalyan ; 984...@bugs.debian.org Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file I did some further investigation of the PST file you sent. I conclude that there are two problems you are experiencing: The first one is that readpst doesn't consider the headers as valid even though they clearly are valid. Since the header validity detection was added to detect invalid PST files I am going to have to discuss this with the upstream author. Perhaps the header validity detection will have to become more generic or perhaps it will be discarded or perhaps the invalid PST files will be detected in a different way. Fixing this will bring back all the headers, including ARC & To. The second one is that for your particular PST file, the To field does not contain an email address. Looking at the debug output I see that the "Display Sent-To Address" contains only the name, not the email. This appears to be a problem with the PST file itself, as the 0x0E04 type, which is PR_DISPLAY_TO, aka the "Address Sent-To", does not contain the email address. The email address does appear in the "Contact Address" and "Search Key" though. I am not sure if it is correct to merge the contact address into the to address though. If you have any more samples of working or broken PST files, I would be happy to have a copy of them to debug further. -- bye, pabs https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise>
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
I did some further investigation of the PST file you sent. I conclude that there are two problems you are experiencing: The first one is that readpst doesn't consider the headers as valid even though they clearly are valid. Since the header validity detection was added to detect invalid PST files I am going to have to discuss this with the upstream author. Perhaps the header validity detection will have to become more generic or perhaps it will be discarded or perhaps the invalid PST files will be detected in a different way. Fixing this will bring back all the headers, including ARC & To. The second one is that for your particular PST file, the To field does not contain an email address. Looking at the debug output I see that the "Display Sent-To Address" contains only the name, not the email. This appears to be a problem with the PST file itself, as the 0x0E04 type, which is PR_DISPLAY_TO, aka the "Address Sent-To", does not contain the email address. The email address does appear in the "Contact Address" and "Search Key" though. I am not sure if it is correct to merge the contact address into the to address though. If you have any more samples of working or broken PST files, I would be happy to have a copy of them to debug further. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
On Wed, 2021-03-10 at 09:28 +, Surla, Sai Kalyan wrote: > Hope you got a chance to at the issue that we reported. I am looking at the issue today. I managed to reproduce the issue that you have reported using the sample PST file that you have provided. I acknowledge that I am seeing both the issues you reported: * only a limited set of headers are being extracted * email address is missing from the To header - but the From header is correct The readpst -d option to output debug information was instrumental in reproducing this, it causes all the info in the PST file and the entire sequence of decoding steps to be output to a debug file. I modified the valid_headers function to also accept the ARC-Seal header but that does not fix the problem. Looking at the debug output I noticed that the X-GM-THRID header is the first header. I then added a X-GM-THRID to the valid_headers function and that fixed the problem. I think that messages with a different first header will not work though, you would have to add all of the first headers that could exist to the valid_headers function, which seems like an incorrect thing to do. If you have any sample PST files that *do* work with the current code, that would allow me to compare the working PST with the broken PST, which would be very helpful in tracking down where the problem is. Until I can figure out the correct fix, I suggest you workaround this bug by adding "return 1;" without quotes as the first line in the valid_headers function. This way you can keep readpst working for your customers while the correct fix is found. I believe that the modern PST files that you have available are all valid files, while the valid_headers function aims to detect broken files, so there should be no risk to the conversion process for your case. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Hi Paul, How are you? Hope you got a chance to at the issue that we reported. I am reiterating the summary of the problem. There are some transport headers starting with “ARC-Seal: ”. These transport headers also contain the To, CC and BCC addresses with both display names and corresponding email IDs. However, the `readpst` is discarding these transport headers while creating the EML file with MIME content and in the final MIME content we are getting only the display names for all the To, CC and BCC addresses. Possible that you might be considering canonical properties to extract the To, CC, and Bcc addresses from the PST file. After looking at the readpst.c file (see below) we understood that the readpst is discarding any transport header that doesn’t start with the specified text. int valid_headers(char *header) // headers are sometimes really bogus - they seem to be fragments of the // message body, so we only use them if they seem to be real rfc822 headers. // this list is composed of ones that we have seen in real pst files. // there are surely others. the problem is - given an arbitrary character // string, is it a valid (or even reasonable) set of rfc822 headers? if (header) { if (header_match(header, "Content-Type: " )) return 1; if (header_match(header, "Date: " )) return 1; if (header_match(header, "From: " )) return 1; if (header_match(header, "MIME-Version: " )) return 1; if (header_match(header, "Microsoft Mail Internet Headers")) return 1; if (header_match(header, "Received: " )) return 1; if (header_match(header, "Return-Path: " )) return 1; if (header_match(header, "Subject: " )) return 1; if (header_match(header, "To: " )) return 1; if (header_match(header, "X-ASG-Debug-ID: " )) return 1; if (header_match(header, "X-Barracuda-URL: " )) return 1; if (header_match(header, "X-x: " )) return 1; if (strlen(header) > 2) { DEBUG_INFO(("Ignore bogus headers = %s\n", header)); } return 0; } else return 0; } As per our understanding, the ARC headers(which helps preserve email authentication results and verifies the identity of email intermediaries that forward a message on to its final destination) are introduced in 2016 and looks like this is not taken care in readpst. Appreciate if you can clarify : 1. Is our understanding correct? 2. If Yes, can we expect a patch from you ? 3. If our understanding is not correct, can we expect a patch with proper fixes, or can you let us know where to fix the problem? 4. Are there any other headers like ARC, that are not taken care? Looking forward for your reply so that we can commit a date to our customers. Thank you Sai Kalyan From: Surla, Sai Kalyan Sent: 08 March 2021 07:28 PM To: 'Paul Wise' ; '984...@bugs.debian.org' <984...@bugs.debian.org> Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file Outlook blocking the PST, please find the zipped PST file. Thank you Sai Kalyan From: Surla, Sai Kalyan Sent: 08 March 2021 07:27 PM To: 'Paul Wise' mailto:p...@debian.org>>; '984...@bugs.debian.org' <984...@bugs.debian.org<mailto:984...@bugs.debian.org>> Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file Sorry, it looks like outlooks blocked this pst. From: Surla, Sai Kalyan Sent: 08 March 2021 01:45 PM To: Paul Wise mailto:p...@debian.org>>; 984...@bugs.debian.org<mailto:984...@bugs.debian.org> Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file Hi Paul, Please find the PST contains single email with which we also faced problem in extracting email addresses under ‘To:’ header. Thank you Sai Kalyan From: Paul Wise mailto:p...@debian.org>> Sent: 08 March 2021 07:02 AM To: Surla, Sai Kalyan mailto:saikalyan.su...@arcserve.com>>; 984...@bugs.debian.org<mailto:984...@bugs.debian.org> Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file Control: found -1 0.6.75-1 On Sun, 2021-03-07 at 17:42 +, Surla, Sai Kalyan wrote: > Already tried with version 0.6.75-1. Thanks, marking the bug as found in that version. > Also compiled the latest code available and tried with it, still the > same results. Thanks for testing t
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Control: found -1 0.6.75-1 On Sun, 2021-03-07 at 17:42 +, Surla, Sai Kalyan wrote: > Already tried with version 0.6.75-1. Thanks, marking the bug as found in that version. > Also compiled the latest code available and tried with it, still the > same results. Thanks for testing this too. > Please find the changes in the attached file. (readpst.c line no. : 1238) It is traditional to provide changes in the patch format by using the `diff -u` command or the corresponding commands from the version control system that the upstream project is using. Below is the output from the Mercurial diff for your change. $ hg diff diff -r 7200790e46ac src/readpst.c --- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700 +++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800 @@ -1235,7 +1235,7 @@ int header_match(char *header, char*field) { int n = strlen(field); -if (strncasecmp(header, field, n) == 0) return 1; // tag:{space} +if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1; // tag:{space} if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) { char *crlftab = "\r\n\t"; DEBUG_INFO(("Possible wrapped header = %s\n", header)); I am fairly certain that this is not the correct fix for this issue. > ARC headers are kind of email authentication headers. Thanks for the info. > For some security reasons we cannot share the original Understood. > if possible we will try to share the inhouse sample pst. That would be necessary to be able to fix the issue. > Meanwhile our observation is if the headers start with the following > headers (...) it is treated as bogus, this email is starting with > some header which is not one of the listed. That does look like what the code does indeed, probably the right fix is to scan through all of the headers instead of just the first one. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Hi Paul, We already tried with version 0.6.75-1. Also compiled the latest code available and tried with it, still the same results. Please find the changes in the attached file. (readpst.c line no. : 1238) ARC headers are kind of email authentication headers. Authenticated Received Chain (ARC) creates a mechanism for individual Internet Mail Handlers to add their authentication assessment to a message's ordered set of handling results. For more details please refer the following rfc https://tools.ietf.org/html/rfc8617. For some security reasons we cannot share the original , we will once discuss and let you know, if possible we will try to share the inhouse sample pst. We will let you know about the PST in the next couple of days Meanwhile our observation is if the headers start with the following headers (Date, From, To, Content-Type, MIME-Version, Microsoft Mail Internet Headers, Received, Subject and some other headers) it is treated as bogus, this email is starting with some header which is not one of the listed. Thank you Sai Kalyan From: Paul Wise Sent: 06 March 2021 08:12 AM To: Surla, Sai Kalyan ; 984...@bugs.debian.org Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file Control: tags -1 + moreinfo On Fri, 2021-03-05 at 23:06 +0530, sai kalyan wrote: > Version: 0.6.71-0.1 Could you test version 0.6.75-1 from Debian bullseye? > Tags: patch Could you attach your patch to the bug report? > for some mails where the transport headers contain ARC headers Could you provide some information about what ARC headers are? > the email addresses are not extracted from the PST and only usernames > are available in the MIME content of emails that are extracted. Please supply an example PST file that this problem occurs with. -- bye, pabs https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise> /*** * readpst.c * Part of the LibPST project * Written by David Smith *dav...@earthcorp.com */ #include "define.h" #include "lzfu.h" #include "msg.h" #define OUTPUT_TEMPLATE "%s.%s" #define OUTPUT_KMAIL_DIR_TEMPLATE ".%s.directory" #define KMAIL_INDEX "../.%s.index" #define SEP_MAIL_FILE_TEMPLATE "%i%s" // max size of the c_time char*. It will store the date of the email #define C_TIME_SIZE 500 struct file_ll { char *name[PST_TYPE_MAX]; char *dname; FILE * output[PST_TYPE_MAX]; int32_t stored_count; int32_t item_count; int32_t skip_count; }; int grim_reaper(); pid_t try_fork(char* folder); void process(pst_item *outeritem, pst_desc_tree *d_ptr); void write_email_body(FILE *f, char *body); void removeCR(char *c); void usage(); void version(); void mk_kmail_dir(char* fname); int close_kmail_dir(); void mk_recurse_dir(char* dir); int close_recurse_dir(); void mk_separate_dir(char *dir); int close_separate_dir(); void mk_separate_file(struct file_ll *f, int32_t t, char *extension, int openit); void close_separate_file(struct file_ll *f); char* my_stristr(char *haystack, char *needle); void check_filename(char *fname); int acceptable_ext(pst_item_attach* attach); void write_separate_attachment(char f_name[], pst_item_attach* attach, int attach_num, pst_file* pst); void write_embedded_message(FILE* f_output, pst_item_attach* attach, char *boundary, pst_file* pf, int save_rtf, char** extra_mime_headers); void write_inline_attachment(FILE* f_output, pst_item_attach* attach, char *boundary, pst_file* pst); int valid_headers(char *header); void header_has_field(char *header, char *field, int *flag); void header_get_subfield(char *field, const char *subfield, char *body_subfield, size_t size_subfield); char* header_get_field(char *header, char *field); char* header_end_field(char *field); void header_strip_field(char *header, char *field); int test_base64(char *body, size_t len); void find_html_charset(char *html, char *charset, size_t charsetlen); void find_rfc822_headers(char** extra_mime_headers); void write_body_part(FILE* f_output, pst_string *body, char *mime, char *charset, char *boundary, pst_file* pst); void write_schedule_part_data(FILE* f_output, pst_item* item, const char* sender, const char* method); void write_schedule_part(FILE* f_output, pst_item* item, const char* sender, const char* boundary); void write_normal_email(FILE* f_output, char f_name[], pst_item* item, int mode, int mode_MH, pst_file* pst, int save_rtf, int embedding, char** extra_mime_headers); void write_vcard(FILE* f_output, pst_item *item, pst_item_contact* contact, char comment[]); int write_extra_categories(FILE* f_output, pst_item* item); void write_journal(FILE* f_output, pst_item* item); void write_app
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Control: tags -1 + moreinfo On Fri, 2021-03-05 at 23:06 +0530, sai kalyan wrote: > Version: 0.6.71-0.1 Could you test version 0.6.75-1 from Debian bullseye? > Tags: patch Could you attach your patch to the bug report? > for some mails where the transport headers contain ARC headers Could you provide some information about what ARC headers are? > the email addresses are not extracted from the PST and only usernames > are available in the MIME content of emails that are extracted. Please supply an example PST file that this problem occurs with. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Package: pst-utils Version: 0.6.71-0.1 Severity: important Tags: patch Hi, We have been using the tool to extract emails from the PST files. However with the recent observations, for some mails where the transport headers contain ARC headers, the email addresses are not extracted from the PST and only usernames are available in the MIME content of emails that are extracted. After enabling debug logs we got to know that all the internet headers are being ignored as bogus headers which also contains the headers To:, From: ... where we can see the email addresses available. As the tool is open-source we tried to debug the tool, post debug we identified that the the headers are ignored (as bogus headers) and the tool is using the metadata extracted to construct MIME content for the email where the email addresses are missing. We would like to point at two parts where the issue could be possibly happened. 1) Parsing the mail from PST - As the structure variable does not contain the addresses for these emails. 2) Ignoring the headers as bogus headers using the incorrect comparison. We are not able to look into the parsing part, but we did some changes to verify the behavior at identification part of bogus headers, probably not appropriate changes. Sample Data: Below is the sample MIME Content that is extracted for an email from PST by readpst utility From: user_1 To: user_2 CC: user_3 where user_1, user_2 and user_3 are just usernames without email addresses We would like to hear back as soon as possible. Thank you Sai Kalyan -- System Information: Debian Release: 10.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.19.0-8-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages pst-utils depends on: ii libc6 2.28-10 ii libgcc1 1:8.3.0-6 ii libgd32.2.5-5.2 ii libglib2.0-0 2.58.3-2+deb10u2 ii libgsf-1-114 1.14.45-1 ii libpst4 0.6.71-0.1 ii libstdc++68.3.0-6 pst-utils recommends no packages. pst-utils suggests no packages.