Re: SIS and tracing the origin of an attachment

2022-03-16 Thread doug

On 3/16/2022 6:05 AM, Patrick Cernko wrote:

Hi all,

On 15.03.22 22:40, doug wrote:



On 3/15/2022 3:45 PM, Oscar del Rio wrote:

On 2022-03-15 9:02 a.m., doug wrote:

On 3/8/2022 5:51 PM, doug wrote:


I'm trying to trace an attachment within an SIS subdirectory to 
the email message(s) that link to it. I say messages because I'm 
also using dovecot dedup. My understanding is the linked file name 
is the hash value of the attachments contents concatenated with 
the GUID of the email message. I have had marginal success with a 
message I created myself.


Example: I generated an email with two attachments. Here are the 
links in my attachment directory.
./26/c5/26c5c540d41779d83d2f5388041d05c67d720d9a-73eca8051acd27627231f2bc99a3 

./65/cd/65cd73112a489ef07f17ed5740aa60358e2dd3fb-74eca8051acd27627231f2bc99a3 







I keep experimenting with this and I still haven't found a reliable 
way to track an attachment back to it's original message so I can 
either notify the user or delete the message with doveadm. Is this 
not possible? I'm using mdbox if that matters. I see a similar 
thread going right now about virus scanning and deleting messages 
but that is maildir and I suspect not using SIS for attachments.


The very few times I've needed to trace a SIS attachment to a 
mailbox, I just grep the "storage" folders for the file hash


find username/storage -type f -exec grep 
9ffa4b246589f8039d123ea909f1520e791bd880 {} +
username/storage/m.46588:X908 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-c9ee303687e13062cf740012bfe47a40 

username/storage/m.46589:X1918 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-080ce71390e1306299730012bfe47a40 



username/storage/m.46588:
BSent
X908 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-c9ee303687e13062cf740012bfe47a40 



username/storage/m.46589:
BINBOX
X1918 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-080ce71390e1306299730012bfe47a40 



-> Attachment in username's INBOX and Sent folders.



Thank you for the suggestion Oscar. My mdbox files are encrypted and 
compressed, so unfortunately directly grepping them will not work.





You can use "doveadm dump" to decompress the files for grepping them, 
not sure about encryption:


find path/to/userhomes/mdbox/storage -name 'm.*' | \
  while read f; do
    doveadm dump $f | \
  grep -E '^msg.(ext-ref|orig-mailbox|guid)' | \
  grep -B2 xx/yy/hash-guid || continue
    echo "Match in $f"
  done

The dump also contains several other fields you might want to display.

Best,


I'll give that a try. With access to the encryption key doveadm dump 
should handle it just fine. I was hopeful there was a method using 
search and index files to minimize overhead.


To summarize what I think I have learned on this journey is the link to 
the hash file only exists within the contents of the email body, but not 
in a way that doveadm search will find it. Hence raw scanning the 
contents of the emails is required.


Many thanks for everyone's help.

--
Doug




Re: SIS and tracing the origin of an attachment

2022-03-16 Thread Patrick Cernko

Hi all,

On 15.03.22 22:40, doug wrote:



On 3/15/2022 3:45 PM, Oscar del Rio wrote:

On 2022-03-15 9:02 a.m., doug wrote:

On 3/8/2022 5:51 PM, doug wrote:


I'm trying to trace an attachment within an SIS subdirectory to the 
email message(s) that link to it. I say messages because I'm also 
using dovecot dedup. My understanding is the linked file name is the 
hash value of the attachments contents concatenated with the GUID of 
the email message. I have had marginal success with a message I 
created myself.


Example: I generated an email with two attachments. Here are the 
links in my attachment directory.
./26/c5/26c5c540d41779d83d2f5388041d05c67d720d9a-73eca8051acd27627231f2bc99a3 

./65/cd/65cd73112a489ef07f17ed5740aa60358e2dd3fb-74eca8051acd27627231f2bc99a3 







I keep experimenting with this and I still haven't found a reliable 
way to track an attachment back to it's original message so I can 
either notify the user or delete the message with doveadm. Is this 
not possible? I'm using mdbox if that matters. I see a similar thread 
going right now about virus scanning and deleting messages but that 
is maildir and I suspect not using SIS for attachments.


The very few times I've needed to trace a SIS attachment to a mailbox, 
I just grep the "storage" folders for the file hash


find username/storage -type f -exec grep 
9ffa4b246589f8039d123ea909f1520e791bd880 {} +
username/storage/m.46588:X908 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-c9ee303687e13062cf740012bfe47a40 

username/storage/m.46589:X1918 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-080ce71390e1306299730012bfe47a40 



username/storage/m.46588:
BSent
X908 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-c9ee303687e13062cf740012bfe47a40 



username/storage/m.46589:
BINBOX
X1918 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-080ce71390e1306299730012bfe47a40 



-> Attachment in username's INBOX and Sent folders.



Thank you for the suggestion Oscar. My mdbox files are encrypted and 
compressed, so unfortunately directly grepping them will not work.





You can use "doveadm dump" to decompress the files for grepping them, 
not sure about encryption:


find path/to/userhomes/mdbox/storage -name 'm.*' | \
  while read f; do
doveadm dump $f | \
  grep -E '^msg.(ext-ref|orig-mailbox|guid)' | \
  grep -B2 xx/yy/hash-guid || continue
echo "Match in $f"
  done

The dump also contains several other fields you might want to display.

Best,
--
Patrick Cernko  +49 681 9325 5815
Joint Administration: Information Services and Technology
Max-Planck-Institute fuer Informatik & Softwaresysteme


smime.p7s
Description: S/MIME Cryptographic Signature


Re: SIS and tracing the origin of an attachment

2022-03-15 Thread doug




On 3/15/2022 3:45 PM, Oscar del Rio wrote:

On 2022-03-15 9:02 a.m., doug wrote:

On 3/8/2022 5:51 PM, doug wrote:


I'm trying to trace an attachment within an SIS subdirectory to the 
email message(s) that link to it. I say messages because I'm also 
using dovecot dedup. My understanding is the linked file name is the 
hash value of the attachments contents concatenated with the GUID of 
the email message. I have had marginal success with a message I 
created myself.


Example: I generated an email with two attachments. Here are the 
links in my attachment directory.
./26/c5/26c5c540d41779d83d2f5388041d05c67d720d9a-73eca8051acd27627231f2bc99a3 

./65/cd/65cd73112a489ef07f17ed5740aa60358e2dd3fb-74eca8051acd27627231f2bc99a3 







I keep experimenting with this and I still haven't found a reliable 
way to track an attachment back to it's original message so I can 
either notify the user or delete the message with doveadm. Is this 
not possible? I'm using mdbox if that matters. I see a similar thread 
going right now about virus scanning and deleting messages but that 
is maildir and I suspect not using SIS for attachments.


The very few times I've needed to trace a SIS attachment to a mailbox, 
I just grep the "storage" folders for the file hash


find username/storage -type f -exec grep 
9ffa4b246589f8039d123ea909f1520e791bd880 {} +
username/storage/m.46588:X908 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-c9ee303687e13062cf740012bfe47a40
username/storage/m.46589:X1918 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-080ce71390e1306299730012bfe47a40


username/storage/m.46588:
BSent
X908 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-c9ee303687e13062cf740012bfe47a40


username/storage/m.46589:
BINBOX
X1918 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-080ce71390e1306299730012bfe47a40


-> Attachment in username's INBOX and Sent folders.



Thank you for the suggestion Oscar. My mdbox files are encrypted and 
compressed, so unfortunately directly grepping them will not work.





Re: SIS and tracing the origin of an attachment

2022-03-15 Thread Oscar del Rio

On 2022-03-15 9:02 a.m., doug wrote:

On 3/8/2022 5:51 PM, doug wrote:


I'm trying to trace an attachment within an SIS subdirectory to the 
email message(s) that link to it. I say messages because I'm also 
using dovecot dedup. My understanding is the linked file name is the 
hash value of the attachments contents concatenated with the GUID of 
the email message. I have had marginal success with a message I 
created myself.


Example: I generated an email with two attachments. Here are the 
links in my attachment directory.
./26/c5/26c5c540d41779d83d2f5388041d05c67d720d9a-73eca8051acd27627231f2bc99a3 

./65/cd/65cd73112a489ef07f17ed5740aa60358e2dd3fb-74eca8051acd27627231f2bc99a3 







I keep experimenting with this and I still haven't found a reliable 
way to track an attachment back to it's original message so I can 
either notify the user or delete the message with doveadm. Is this not 
possible? I'm using mdbox if that matters. I see a similar thread 
going right now about virus scanning and deleting messages but that is 
maildir and I suspect not using SIS for attachments.


The very few times I've needed to trace a SIS attachment to a mailbox, I 
just grep the "storage" folders for the file hash


find username/storage -type f -exec grep 
9ffa4b246589f8039d123ea909f1520e791bd880 {} +
username/storage/m.46588:X908 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-c9ee303687e13062cf740012bfe47a40
username/storage/m.46589:X1918 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-080ce71390e1306299730012bfe47a40


username/storage/m.46588:
BSent
X908 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-c9ee303687e13062cf740012bfe47a40


username/storage/m.46589:
BINBOX
X1918 2409141 B72 
9f/fa/9ffa4b246589f8039d123ea909f1520e791bd880-080ce71390e1306299730012bfe47a40


-> Attachment in username's INBOX and Sent folders.






Re: SIS and tracing the origin of an attachment

2022-03-15 Thread doug

On 3/8/2022 5:51 PM, doug wrote:

Hi All,

I'm trying to trace an attachment within an SIS subdirectory to the 
email message(s) that link to it. I say messages because I'm also 
using dovecot dedup. My understanding is the linked file name is the 
hash value of the attachments contents concatenated with the GUID of 
the email message. I have had marginal success with a message I 
created myself.


Example: I generated an email with two attachments. Here are the links 
in my attachment directory.
./26/c5/26c5c540d41779d83d2f5388041d05c67d720d9a-73eca8051acd27627231f2bc99a3 

./65/cd/65cd73112a489ef07f17ed5740aa60358e2dd3fb-74eca8051acd27627231f2bc99a3 



In my sent folder the actual GUID of the message is 
75eca8051acd27627231f2bc99a3.  So the GUID of the attachment is 
based on the GUID of the message, but not exact. The second hex byte 
seems to be decremented as an offset of the attachment index from the 
GUID of the message. At least in my one example.


# doveadm dump 
/mailstore/doug/mail/mailboxes/Sent/dbox-Mails/dovecot.index | grep 
guid | tail -1

    - guid: 75eca8051acd27627231f2bc99a3

With that actual GUID I can find the message with a search:
# doveadm search -u doug mailbox Sent guid 
75eca8051acd27627231f2bc99a3

doug e5711f1cf2c9294f7109059b96e4 53526

Now let's try to track down another email when only the HASH-GUID 
value is known. Here is one randomly picked.


./00/a2/00a2d5de3e41053d59bd10084826bbe094aa1c59-57857b09d1a327627e26f2bc99a3 



# doveadm search -A mailbox '*' guid 57857b09d1a327627e26f2bc99a3
# doveadm search -A mailbox '*' guid 58857b09d1a327627e26f2bc99a3
# doveadm search -A mailbox '*' guid 59857b09d1a327627e26f2bc99a3

I repeated this incrementing and decrementing from 5085... through 
5f85... and never located the message.


This seems like it should be trivial but I've been struggling with it 
for days. The GUID isn't random, there must be a way to track the 
attachment back. What am I missing?


And for those wondering why, our virus scanner flagged a number of 
attachments, some with several links, and I want ask the users to 
delete the offending messages so I can purge them from the server. If 
I can find the emails I can give them the mail folder, date/time, and 
subject of the message.




I keep experimenting with this and I still haven't found a reliable way 
to track an attachment back to it's original message so I can either 
notify the user or delete the message with doveadm. Is this not 
possible? I'm using mdbox if that matters. I see a similar thread going 
right now about virus scanning and deleting messages but that is maildir 
and I suspect not using SIS for attachments.


--
Doug


SIS and tracing the origin of an attachment

2022-03-08 Thread doug

Hi All,

I'm trying to trace an attachment within an SIS subdirectory to the 
email message(s) that link to it. I say messages because I'm also using 
dovecot dedup. My understanding is the linked file name is the hash 
value of the attachments contents concatenated with the GUID of the 
email message. I have had marginal success with a message I created myself.


Example: I generated an email with two attachments. Here are the links 
in my attachment directory.

./26/c5/26c5c540d41779d83d2f5388041d05c67d720d9a-73eca8051acd27627231f2bc99a3
./65/cd/65cd73112a489ef07f17ed5740aa60358e2dd3fb-74eca8051acd27627231f2bc99a3

In my sent folder the actual GUID of the message is 
75eca8051acd27627231f2bc99a3.  So the GUID of the attachment is 
based on the GUID of the message, but not exact. The second hex byte 
seems to be decremented as an offset of the attachment index from the 
GUID of the message. At least in my one example.


# doveadm dump 
/mailstore/doug/mail/mailboxes/Sent/dbox-Mails/dovecot.index | grep guid 
| tail -1

    - guid: 75eca8051acd27627231f2bc99a3

With that actual GUID I can find the message with a search:
# doveadm search -u doug mailbox Sent guid 75eca8051acd27627231f2bc99a3
doug e5711f1cf2c9294f7109059b96e4 53526

Now let's try to track down another email when only the HASH-GUID value 
is known. Here is one randomly picked.


./00/a2/00a2d5de3e41053d59bd10084826bbe094aa1c59-57857b09d1a327627e26f2bc99a3

# doveadm search -A mailbox '*' guid 57857b09d1a327627e26f2bc99a3
# doveadm search -A mailbox '*' guid 58857b09d1a327627e26f2bc99a3
# doveadm search -A mailbox '*' guid 59857b09d1a327627e26f2bc99a3

I repeated this incrementing and decrementing from 5085... through 
5f85... and never located the message.


This seems like it should be trivial but I've been struggling with it 
for days. The GUID isn't random, there must be a way to track the 
attachment back. What am I missing?


And for those wondering why, our virus scanner flagged a number of 
attachments, some with several links, and I want ask the users to delete 
the offending messages so I can purge them from the server. If I can 
find the emails I can give them the mail folder, date/time, and subject 
of the message.


--
Doug