Bug#984581: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-08-16 Thread Paul Wise
Control: forwarded -1 https://bugzilla.redhat.com/show_bug.cgi?id=1994178

On Sun, 30 May 2021 10:26:21 +0800 Paul Wise wrote:
> On Mon, 5 Apr 2021 06:04:49 + "Surla, Sai Kalyan" wrote:
> 
> > Is there any update on the issues.
> 
> I finally found time to work on the first issue (header detection)
> where we had a workaround already and created proper patches (attached)
> for the issue and sent them to the upstream maintainer.

I have forwarded the patches to the Fedora bug tracker, hopefully that
will mean that the upstream maintainer will accept them now.

I had to fix a bug with the first patch causing a segfault.

I will include the patches in the next upload to Debian unstable.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#984581: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-05-30 Thread Surla, Sai Kalyan
Hi Paul,

Thanks for your time on this issue.
We will verify the patch that you shared and will let you know the results.

Thank you
Sai Kalyan


_

[cid:arcserve-email-logo_566d469b-c8dc-46eb-909b-300e3f3e47a1.jpg]<https://arcserve.com/>


Sai Kalyan Surla  |  Software Engineer
Office: 7993045110  |  Mobile: 9182331089  |  saikalyan.su...@arcserve.com
arcserve.com<https://www.arcserve.com/>  |  
Twitter<https://twitter.com/Arcserve>  |  
LinkedIn<https://www.linkedin.com/company/arcserve/>  |  
YouTube<https://www.youtube.com/user/arcserve>


_
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.
From: Paul Wise 
Sent: 30 May 2021 07:56 AM
To: 984...@bugs.debian.org; Surla, Sai Kalyan 
Subject: Re: RE: Bug#984581: pst-utils: Fails to extract email addresses for 
emails having ARC headers from PST file

On Mon, 5 Apr 2021 06:04:49 + "Surla, Sai Kalyan" wrote:

> Is there any update on the issues.

I finally found time to work on the first issue (header detection)
where we had a workaround already and created proper patches (attached)
for the issue and sent them to the upstream maintainer.

--
bye,
pabs

https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise>


Bug#984581: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-05-29 Thread Paul Wise
On Mon, 5 Apr 2021 06:04:49 + "Surla, Sai Kalyan" wrote:

> Is there any update on the issues.

I finally found time to work on the first issue (header detection)
where we had a workaround already and created proper patches (attached)
for the issue and sent them to the upstream maintainer.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise
From a4aa24ae5675b09385d0c88add48c3ab046e699d Mon Sep 17 00:00:00 2001
From: Paul Wise 
Date: Sun, 30 May 2021 10:02:14 +0800
Subject: [PATCH 1/3] Add debugging for header detection

---
 src/readpst.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/readpst.c b/src/readpst.c
index 6d94f15..b5910e9 100644
--- a/src/readpst.c
+++ b/src/readpst.c
@@ -1591,6 +1591,8 @@ void write_normal_email(FILE* f_output, char f_name[], pst_item* item, int mode,
 DEBUG_ENT("write_normal_email");
 
 pst_convert_utf8_null(item, &item->email->header);
+DEBUG_INFO(("PST headers\n%s\n", *item->email->header.str));
+DEBUG_INFO(("Extra MIME headers\n%s\n", *extra_mime_headers));
 headers = valid_headers(item->email->header.str) ? item->email->header.str :
   valid_headers(*extra_mime_headers) ? *extra_mime_headers :
   NULL;
-- 
2.30.2

From bade93dcdb435bc7bec50cf4b54481731beea45c Mon Sep 17 00:00:00 2001
From: Paul Wise 
Date: Sun, 30 May 2021 09:49:57 +0800
Subject: [PATCH 2/3] Also detect email headers wrapped with space instead of
 tab

Spaces are commonly used for email header wrapping.
---
 src/readpst.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/readpst.c b/src/readpst.c
index b5910e9..6663771 100644
--- a/src/readpst.c
+++ b/src/readpst.c
@@ -1275,8 +1275,10 @@ int  header_match(char *header, char*field) {
 if (strncasecmp(header, field, n) == 0) return 1;   // tag:{space}
 if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
 char *crlftab = "\r\n\t";
+char *crlfspc = "\r\n ";
 DEBUG_INFO(("Possible wrapped header = %s\n", header));
 if (strncasecmp(header+n-1, crlftab, 3) == 0) return 1; // tag:{cr}{lf}{tab}
+if (strncasecmp(header+n-1, crlfspc, 3) == 0) return 1; // tag:{cr}{lf}{space}
 }
 return 0;
 }
-- 
2.30.2

From da5f159caa66db380b793f9062a36888c9b12467 Mon Sep 17 00:00:00 2001
From: Paul Wise 
Date: Sun, 30 May 2021 09:51:26 +0800
Subject: [PATCH 3/3] Detect reasonable email headers too

RFC 5322 specifies the syntax of email headers, most header fields are more
restricted though so use a restricted check in case the headers are bogus
parts of the body that happen to match RFC 5322.

Fixes: https://bugs.debian.org/984581
---
 src/readpst.c | 60 +++
 1 file changed, 60 insertions(+)

diff --git a/src/readpst.c b/src/readpst.c
index 6663771..97ba127 100644
--- a/src/readpst.c
+++ b/src/readpst.c
@@ -1283,6 +1283,65 @@ int  header_match(char *header, char*field) {
 return 0;
 }
 
+// https://en.wikipedia.org/wiki/Email#Message_header
+// https://www.rfc-editor.org/rfc/rfc5322.html
+// https://www.iana.org/assignments/message-headers/message-headers.xhtml
+int  header_is_reasonable(char *header)
+{
+char *c;
+#define C *c
+
+// The header must not be NULL
+if (header) c = header;
+else return 0;
+
+// usually the header field name starts with upper-case: A-Z
+if (C >= 'A' && C <= 'Z') c++;
+else return 0;
+
+while(1) {
+// most header field names use a limited set of characters: - 0-9 A-Z a-z
+if (
+(C >= 'A' && C <= 'Z') ||
+(C >= 'a' && C <= 'z') ||
+(C >= '0' && C <= '9') ||
+(C == '-')
+   ) {
+c++;
+// the header field name is then terminated with a colon
+} else if (C == ':') {
+  c++;
+  goto parse_header_field_value;
+// other characters are an indicator of an invalid header
+} else {
+  return 0;
+}
+}
+
+parse_header_field_value:
+while(1) {
+// header field values are printable US-ASCII plus space/tab
+if (
+(C >= 33 && C <= 126) ||
+(C == ' ' || C == '\t')
+   ) {
+c++;
+// the header field value is then terminated with CRLF
+} else if (C == '\r' && *(c+1) == '\n') {
+c += 2;
+// the value could continue to the next line though
+if (C == ' ' || C == '\t') c++;
+else return 1;
+// other characters are an indicator of an invalid header
+} else {
+  return 0;
+}
+}
+
+#undef C
+
+}
+
 int  valid_headers(char *header)
 {
 // headers are sometimes really bogus - they seem to be fragments of the
@@ -1303,6 +1362,7 @@ int  valid_headers(char *header)
 if (header_match(header, "X-ASG-Debug-ID: "   )) return 1;
 if (header_match(header, "X-Barracuda-URL: "  )) return 1;
 if (header_matc

Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-04-04 Thread Paul Wise
On Mon, 2021-04-05 at 06:04 +, Surla, Sai Kalyan wrote:

> Is there any update on the issues.

I discussed the issues with upstream.

Upstream doesn't have time to work on the issues.

Upstream confirmed my suggested solutions sound OK.

I haven't yet had time to work on the solutions.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-04-04 Thread Surla, Sai Kalyan
Hi Paul,

Hope you are doing good.
Is there any update on the issues.

Thank you
Sai Kalyan

From: Paul Wise 
Sent: 22 March 2021 12:56 PM
To: Surla, Sai Kalyan ; 984...@bugs.debian.org
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

On Mon, 2021-03-22 at 05:41 +, Surla, Sai Kalyan wrote:

> In this case can we still go with the temporary change that you
> suggested as the issue is little different with this PST?

The temporary change will not work for the second PST, since it only
works around the header detection issue, but the second PST doesn't
have the full MIME headers, only the predefined PST To/CC/BCC fields.

There isn't any easy workaround for the issue with the second PST.

--
bye,
pabs

https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise>


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-22 Thread Paul Wise
On Mon, 2021-03-22 at 05:41 +, Surla, Sai Kalyan wrote:

> In this case can we still go with the temporary change that you
> suggested as the issue is little different with this PST?

The temporary change will not work for the second PST, since it only
works around the header detection issue, but the second PST doesn't
have the full MIME headers, only the predefined PST To/CC/BCC fields.

There isn't any easy workaround for the issue with the second PST.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-21 Thread Surla, Sai Kalyan
Hi Paul,

In this case can we still go with the temporary change that you suggested as 
the issue is little different with this PST?

Thank you
Sai Kalyan

From: Paul Wise 
Sent: 19 March 2021 08:26 AM
To: Surla, Sai Kalyan ; 984...@bugs.debian.org
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

On Fri, 2021-03-19 at 09:03 +0800, Paul Wise wrote:

> The specs indicate that 0x39fe is indeed the recipient address:

The issue in libpst when there are no MIME headers in the PST file is:

There are some MAPI properties for To/CC/BCC:

https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property>
https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property>
https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property>

These contain *only* the names and not the addresses.

Outlook fills them automatically from the list of recipients.

Outlook stores the recipients in a separate table to email properties.

libpst stores them in the sentto/cc/bcc fields of the email structure.

libpst has no storage of the recipients table of the PST file.

libpst processes the MAPI types one-by-one rather than in separate
tables and only has one action per MAPI type.

So this is not going to be easy to fix.

I will discuss this with upstream.

--
bye,
pabs

https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise>


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-18 Thread Paul Wise
On Fri, 2021-03-19 at 09:03 +0800, Paul Wise wrote:

> The specs indicate that 0x39fe is indeed the recipient address:

The issue in libpst when there are no MIME headers in the PST file is:

There are some MAPI properties for To/CC/BCC:

https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property
https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property
https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property

These contain *only* the names and not the addresses.

Outlook fills them automatically from the list of recipients.

Outlook stores the recipients in a separate table to email properties.

libpst stores them in the sentto/cc/bcc fields of the email structure.

libpst has no storage of the recipients table of the PST file.

libpst processes the MAPI types one-by-one rather than in separate
tables and only has one action per MAPI type.

So this is not going to be easy to fix.

I will discuss this with upstream.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-18 Thread Paul Wise
On Fri, 2021-03-19 at 08:30 +0800, Paul Wise wrote:

> I noticed something in common between the original PST file and the
> new PST file you have sent, they both have an unknown MAPI type
> 0x39fe that contains the email addresses of the recipients. So I will
> try to find out in the PST file specifications what this MAPI type is
> for and then add some code to libpst and readpst to decode it.

The specs indicate that 0x39fe is indeed the recipient address:

https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-pst/141923d5-15ab-4ef1-a524-6dce75aae546
https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-pst/5ee9a00a-858b-47db-95b3-f91518640ea7
https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagsmtpaddress-canonical-property

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-18 Thread Paul Wise
On Thu, 2021-03-18 at 17:14 +, Surla, Sai Kalyan wrote:

> Please find a PST file

As far as I can tell from the `readpst -d debug.log` output, this new
PST file does not have any MIME headers in it, so it is expected that
fixing the valid_headers function will do nothing. I expect if you look
at the PST file in Outlook you will see there are no MIME headers.

I noticed something in common between the original PST file and the new
PST file you have sent, they both have an unknown MAPI type 0x39fe that
contains the email addresses of the recipients. So I will try to find
out in the PST file specifications what this MAPI type is for and then
add some code to libpst and readpst to decode it.

$ rm -f * ; /usr/bin/readpst -d debug.log 
~/stash/samples/pst/bugs.debian.org/984581/forpst.pst ; echo ; grep -A5 
'mapi-id: 0x39fe' debug.log
Opening PST file and indexes...
Processing Folder "Deleted Items"
Processing Folder "for pst"
"Outlook Data File" - 2 items done, 0 items skipped.
"for pst" - 1 items done, 0 items skipped.

2356166 pst_process libpst.c(2194) #10 - mapi-id: 0x39fe type: 
0x1f length: 0x13
2356166 pst_process libpst.c(3172) Unknown type 0x39fe Unicode 
String Data [size = 0x13]
2356166 pst_process libpst.c(3174) 
2356166 00  :64 65 65 70 74 69 73 6b 40 67 6d 61 69 
6c 2e 63 :deeptisk@gmail.c
2356166 10  :6f 6d 00   
 :om.

$ rm -f * ; /usr/bin/readpst -d debug.log 
~/stash/samples/pst/bugs.debian.org/984581/u3si.pst ; echo ; grep -A5 'mapi-id: 
0x39fe' debug.log
Opening PST file and indexes...
Processing Folder "Deleted Items"
Processing Folder "Sent Items"
"Outlook Data File" - 2 items done, 0 items skipped.
"Sent Items" - 1 items done, 0 items skipped.

2356205 pst_process libpst.c(2194) #13 - mapi-id: 0x39fe type: 
0x1f length: 0x16
2356205 pst_process libpst.c(3172) Unknown type 0x39fe Unicode 
String Data [size = 0x16]
2356205 pst_process libpst.c(3174) 
2356205 00  :4d 79 55 73 65 72 31 40 65 78 63 68 31 
33 66 61 :MyUser1@exch13fa
2356205 10  :73 2e 6c 6f 63 00  
 :s.loc.

--
2356205 pst_process libpst.c(2194) #13 - mapi-id: 0x39fe type: 
0x1f length: 0x1c
2356205 pst_process libpst.c(3172) Unknown type 0x39fe Unicode 
String Data [size = 0x1c]
2356205 pst_process libpst.c(3174) 
2356205 00  :41 64 6d 69 6e 69 73 74 72 61 74 6f 72 
40 65 78 :Administrator@ex
2356205 10  :63 68 31 33 66 61 73 2e 6c 6f 63 00
 :ch13fas.loc.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-15 Thread Surla, Sai Kalyan
Hi Paul,

Thanks for your time on this issue.

We will try to provide our inhouse PST as soon as possible.

Thank you

From: Paul Wise 
Sent: 15 March 2021 09:47 AM
To: Surla, Sai Kalyan ; 984...@bugs.debian.org
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

I did some further investigation of the PST file you sent.

I conclude that there are two problems you are experiencing:

The first one is that readpst doesn't consider the headers as valid
even though they clearly are valid. Since the header validity detection
was added to detect invalid PST files I am going to have to discuss
this with the upstream author. Perhaps the header validity detection
will have to become more generic or perhaps it will be discarded or
perhaps the invalid PST files will be detected in a different way.
Fixing this will bring back all the headers, including ARC & To.

The second one is that for your particular PST file, the To field does
not contain an email address. Looking at the debug output I see that
the "Display Sent-To Address" contains only the name, not the email.
This appears to be a problem with the PST file itself, as the 0x0E04
type, which is PR_DISPLAY_TO, aka the "Address Sent-To", does not
contain the email address. The email address does appear in the
"Contact Address" and "Search Key" though. I am not sure if it is
correct to merge the contact address into the to address though.

If you have any more samples of working or broken PST files, I would be
happy to have a copy of them to debug further.

--
bye,
pabs

https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise>


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-14 Thread Paul Wise
I did some further investigation of the PST file you sent.

I conclude that there are two problems you are experiencing:

The first one is that readpst doesn't consider the headers as valid
even though they clearly are valid. Since the header validity detection
was added to detect invalid PST files I am going to have to discuss
this with the upstream author. Perhaps the header validity detection
will have to become more generic or perhaps it will be discarded or
perhaps the invalid PST files will be detected in a different way.
Fixing this will bring back all the headers, including ARC & To.

The second one is that for your particular PST file, the To field does
not contain an email address. Looking at the debug output I see that
the "Display Sent-To Address" contains only the name, not the email.
This appears to be a problem with the PST file itself, as the 0x0E04
type, which is PR_DISPLAY_TO, aka the "Address Sent-To", does not
contain the email address. The email address does appear in the
"Contact Address" and "Search Key" though. I am not sure if it is
correct to merge the contact address into the to address though.

If you have any more samples of working or broken PST files, I would be
happy to have a copy of them to debug further.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-14 Thread Paul Wise
On Wed, 2021-03-10 at 09:28 +, Surla, Sai Kalyan wrote:

> Hope you got a chance to at the issue that we reported.

I am looking at the issue today.

I managed to reproduce the issue that you have reported using the
sample PST file that you have provided.

I acknowledge that I am seeing both the issues you reported:

 * only a limited set of headers are being extracted
 * email address is missing from the To header
    - but the From header is correct

The readpst -d option to output debug information was instrumental in
reproducing this, it causes all the info in the PST file and the entire
sequence of decoding steps to be output to a debug file.

I modified the valid_headers function to also accept the ARC-Seal
header but that does not fix the problem. Looking at the debug output I
noticed that the X-GM-THRID header is the first header. I then added a
X-GM-THRID to the valid_headers function and that fixed the problem. I
think that messages with a different first header will not work though,
you would have to add all of the first headers that could exist to the
valid_headers function, which seems like an incorrect thing to do.

If you have any sample PST files that *do* work with the current code,
that would allow me to compare the working PST with the broken PST,
which would be very helpful in tracking down where the problem is.

Until I can figure out the correct fix, I suggest you workaround this
bug by adding "return 1;" without quotes as the first line in the
valid_headers function. This way you can keep readpst working for your
customers while the correct fix is found. I believe that the modern PST
files that you have available are all valid files, while the
valid_headers function aims to detect broken files, so there should be
no risk to the conversion process for your case.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-10 Thread Surla, Sai Kalyan
Hi Paul,

How are you? Hope you got a chance to at the issue that we reported. I am 
reiterating the summary of the problem.
There are some transport headers starting with “ARC-Seal: ”. These transport 
headers also contain the To, CC and BCC addresses with both display names and 
corresponding email IDs. However, the `readpst` is discarding these transport 
headers while creating the EML file with MIME content and in the final MIME 
content we are getting only the display names for all the To, CC and BCC 
addresses. Possible that you might be considering canonical properties to 
extract the To, CC, and Bcc addresses from the PST file.


After looking at the readpst.c file (see below) we understood that the readpst 
is discarding any transport header that doesn’t start with the specified text.



int  valid_headers(char *header)

 // headers are sometimes really bogus - they seem to be fragments of the

 // message body, so we only use them if they seem to be real rfc822 
headers.

 // this list is composed of ones that we have seen in real pst files.

 // there are surely others. the problem is - given an arbitrary character

 // string, is it a valid (or even reasonable) set of rfc822 headers?

 if (header) {

 if (header_match(header, "Content-Type: " )) return 1;

 if (header_match(header, "Date: " )) return 1;

 if (header_match(header, "From: " )) return 1;

 if (header_match(header, "MIME-Version: " )) return 1;

 if (header_match(header, "Microsoft Mail Internet Headers")) return 1;

 if (header_match(header, "Received: " )) return 1;

 if (header_match(header, "Return-Path: "  )) return 1;

 if (header_match(header, "Subject: "  )) return 1;

 if (header_match(header, "To: "   )) return 1;

 if (header_match(header, "X-ASG-Debug-ID: "   )) return 1;

 if (header_match(header, "X-Barracuda-URL: "  )) return 1;

 if (header_match(header, "X-x: "  )) return 1;

 if (strlen(header) > 2) {

 DEBUG_INFO(("Ignore bogus headers = %s\n", header));

 }

 return 0;

 }

 else return 0;

}

As per our understanding, the ARC headers(which helps preserve email 
authentication results and verifies the identity of email intermediaries that 
forward a message on to its final destination) are introduced in 2016 and looks 
like this is not taken care in readpst.

Appreciate if you can clarify :

  1.  Is our understanding correct?
  2.  If Yes, can we expect a patch from you ?
  3.  If our understanding is not correct, can we expect a patch with proper 
fixes, or can you let us know where to fix the problem?
  4.  Are there any other headers like ARC, that are not taken care?



Looking forward for your reply so that we can commit a date to our customers.

Thank you
Sai Kalyan


From: Surla, Sai Kalyan
Sent: 08 March 2021 07:28 PM
To: 'Paul Wise' ; '984...@bugs.debian.org' 
<984...@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

Outlook blocking the PST, please find the zipped PST file.

Thank you
Sai Kalyan

From: Surla, Sai Kalyan
Sent: 08 March 2021 07:27 PM
To: 'Paul Wise' mailto:p...@debian.org>>; 
'984...@bugs.debian.org' <984...@bugs.debian.org<mailto:984...@bugs.debian.org>>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

Sorry, it looks like outlooks blocked this pst.

From: Surla, Sai Kalyan
Sent: 08 March 2021 01:45 PM
To: Paul Wise mailto:p...@debian.org>>; 
984...@bugs.debian.org<mailto:984...@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

Hi Paul,

Please find the PST contains single email with which we also faced problem in 
extracting email addresses under ‘To:’ header.

Thank you
Sai Kalyan

From: Paul Wise mailto:p...@debian.org>>
Sent: 08 March 2021 07:02 AM
To: Surla, Sai Kalyan 
mailto:saikalyan.su...@arcserve.com>>; 
984...@bugs.debian.org<mailto:984...@bugs.debian.org>
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

Control: found -1 0.6.75-1

On Sun, 2021-03-07 at 17:42 +, Surla, Sai Kalyan wrote:

> Already tried with version 0.6.75-1.

Thanks, marking the bug as found in that version.

> Also compiled the latest code available and tried with it, still the
> same results.

Thanks for testing t

Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-07 Thread Paul Wise
Control: found -1 0.6.75-1

On Sun, 2021-03-07 at 17:42 +, Surla, Sai Kalyan wrote:

> Already tried with version 0.6.75-1.

Thanks, marking the bug as found in that version.

> Also compiled the latest code available and tried with it, still the
> same results.

Thanks for testing this too.

> Please find the changes in the attached file. (readpst.c line no. : 1238)

It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.

Below is the output from the Mercurial diff for your change.

   $ hg diff
   diff -r 7200790e46ac src/readpst.c
   --- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700
   +++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800
   @@ -1235,7 +1235,7 @@

int  header_match(char *header, char*field) {
int n = strlen(field);
   -if (strncasecmp(header, field, n) == 0) return 1;   // tag:{space}
   +if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) 
return 1;   // tag:{space}
if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
char *crlftab = "\r\n\t";
DEBUG_INFO(("Possible wrapped header = %s\n", header));


I am fairly certain that this is not the correct fix for this issue.

> ARC headers are kind of email authentication headers.

Thanks for the info.

> For some security reasons we cannot share the original

Understood.

> if possible we will try to share the inhouse sample pst.

That would be necessary to be able to fix the issue.

> Meanwhile our observation is if the headers start with the following
> headers (...) it is treated as bogus, this email is starting with
> some header which is not one of the listed.

That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-07 Thread Surla, Sai Kalyan
Hi Paul,

We already tried with version 0.6.75-1. Also compiled the latest code available 
and tried with it, still the same results.

Please find the changes in the attached file. (readpst.c line no. : 1238)

ARC headers are kind of email authentication headers. Authenticated Received 
Chain (ARC) creates a mechanism for individual Internet Mail Handlers to add 
their authentication assessment to a message's ordered set of handling results. 
For more details please refer the following rfc 
https://tools.ietf.org/html/rfc8617.

For some security reasons we cannot share the original , we will once discuss 
and let you know, if possible we will try to share the inhouse sample pst. We 
will let you know about the PST in the next couple of days
Meanwhile our observation is if the headers start with the following headers 
(Date, From, To, Content-Type, MIME-Version, Microsoft Mail Internet Headers, 
Received, Subject and some other headers) it is treated as bogus, this email is 
starting with some header which is not one of the listed.

Thank you
Sai Kalyan


From: Paul Wise 
Sent: 06 March 2021 08:12 AM
To: Surla, Sai Kalyan ; 984...@bugs.debian.org
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

Control: tags -1 + moreinfo

On Fri, 2021-03-05 at 23:06 +0530, sai kalyan wrote:

> Version: 0.6.71-0.1

Could you test version 0.6.75-1 from Debian bullseye?

> Tags: patch

Could you attach your patch to the bug report?

> for some mails where the transport headers contain ARC headers

Could you provide some information about what ARC headers are?

> the email addresses are not extracted from the PST and only usernames
> are available in the MIME content of emails that are extracted.

Please supply an example PST file that this problem occurs with.

--
bye,
pabs

https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise>
/***
 * readpst.c
 * Part of the LibPST project
 * Written by David Smith
 *dav...@earthcorp.com
 */

#include "define.h"
#include "lzfu.h"
#include "msg.h"

#define OUTPUT_TEMPLATE "%s.%s"
#define OUTPUT_KMAIL_DIR_TEMPLATE ".%s.directory"
#define KMAIL_INDEX "../.%s.index"
#define SEP_MAIL_FILE_TEMPLATE "%i%s"

// max size of the c_time char*. It will store the date of the email
#define C_TIME_SIZE 500

struct file_ll {
char *name[PST_TYPE_MAX];
char *dname;
FILE * output[PST_TYPE_MAX];
int32_t stored_count;
int32_t item_count;
int32_t skip_count;
};

int   grim_reaper();
pid_t try_fork(char* folder);
void  process(pst_item *outeritem, pst_desc_tree *d_ptr);
void  write_email_body(FILE *f, char *body);
void  removeCR(char *c);
void  usage();
void  version();
void  mk_kmail_dir(char* fname);
int   close_kmail_dir();
void  mk_recurse_dir(char* dir);
int   close_recurse_dir();
void  mk_separate_dir(char *dir);
int   close_separate_dir();
void  mk_separate_file(struct file_ll *f, int32_t t, char *extension, int 
openit);
void  close_separate_file(struct file_ll *f);
char* my_stristr(char *haystack, char *needle);
void  check_filename(char *fname);
int   acceptable_ext(pst_item_attach* attach);
void  write_separate_attachment(char f_name[], pst_item_attach* attach, int 
attach_num, pst_file* pst);
void  write_embedded_message(FILE* f_output, pst_item_attach* attach, char 
*boundary, pst_file* pf, int save_rtf, char** extra_mime_headers);
void  write_inline_attachment(FILE* f_output, pst_item_attach* attach, char 
*boundary, pst_file* pst);
int   valid_headers(char *header);
void  header_has_field(char *header, char *field, int *flag);
void  header_get_subfield(char *field, const char *subfield, char 
*body_subfield, size_t size_subfield);
char* header_get_field(char *header, char *field);
char* header_end_field(char *field);
void  header_strip_field(char *header, char *field);
int   test_base64(char *body, size_t len);
void  find_html_charset(char *html, char *charset, size_t charsetlen);
void  find_rfc822_headers(char** extra_mime_headers);
void  write_body_part(FILE* f_output, pst_string *body, char *mime, char 
*charset, char *boundary, pst_file* pst);
void  write_schedule_part_data(FILE* f_output, pst_item* item, const char* 
sender, const char* method);
void  write_schedule_part(FILE* f_output, pst_item* item, const char* 
sender, const char* boundary);
void  write_normal_email(FILE* f_output, char f_name[], pst_item* item, int 
mode, int mode_MH, pst_file* pst, int save_rtf, int embedding, char** 
extra_mime_headers);
void  write_vcard(FILE* f_output, pst_item *item, pst_item_contact* 
contact, char comment[]);
int   write_extra_categories(FILE* f_output, pst_item* item);
void  write_journal(FILE* f_output, pst_item* item);
void  write_app

Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-05 Thread Paul Wise
Control: tags -1 + moreinfo

On Fri, 2021-03-05 at 23:06 +0530, sai kalyan wrote:

> Version: 0.6.71-0.1

Could you test version 0.6.75-1 from Debian bullseye?

> Tags: patch

Could you attach your patch to the bug report?

> for some mails where the transport headers contain ARC headers

Could you provide some information about what ARC headers are?

> the email addresses are not extracted from the PST and only usernames
> are available in the MIME content of emails that are extracted.

Please supply an example PST file that this problem occurs with.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

2021-03-05 Thread sai kalyan
Package: pst-utils
Version: 0.6.71-0.1
Severity: important
Tags: patch

Hi,

We have been using the tool to extract emails from the PST files. However with
the recent observations, for some mails where the transport headers contain ARC
headers, the email addresses are not extracted from the PST and only usernames
are available in the MIME content of emails that are extracted.
After enabling debug logs we got to know that all the internet headers are
being ignored as bogus headers which also contains the headers To:, From: ...
where we can see the email addresses available.

As the tool is open-source we tried to debug the tool, post debug we identified
that the the headers are ignored (as bogus headers) and the tool is using the
metadata extracted to construct MIME content for the email where the email
addresses are missing.

We would like to point at two parts where the issue could be possibly happened.
1) Parsing the mail from PST - As the structure variable does not contain the
addresses for these emails.
2) Ignoring the headers as bogus headers using the incorrect comparison.


We are not able to look into the parsing part, but we did some changes to
verify the behavior at identification part of bogus headers, probably not
appropriate changes.

Sample Data:
Below is the sample MIME Content that is extracted for an email from
PST by readpst utility

From: user_1
To: user_2
CC: user_3

where user_1, user_2 and user_3 are just usernames without email addresses

We would like to hear back as soon as possible.

Thank you
Sai Kalyan



-- System Information:
Debian Release: 10.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.0-8-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8),
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages pst-utils depends on:
ii  libc6 2.28-10
ii  libgcc1   1:8.3.0-6
ii  libgd32.2.5-5.2
ii  libglib2.0-0  2.58.3-2+deb10u2
ii  libgsf-1-114  1.14.45-1
ii  libpst4   0.6.71-0.1
ii  libstdc++68.3.0-6

pst-utils recommends no packages.

pst-utils suggests no packages.