On 29-01-16 13:18, Maarten Bout wrote:
> I want do a DLP scan, if for example:
> 
> Header contains a keyword1 AND the subject (or body) contains keyword2
> 
> 
> Can you give an estimation off the performance drop and extra memory, when 
> scanning through all the parts?

The memory requirements are not the biggest problem. However if the
extracted text is very large (several MBs), it might be that the regular
expression engine might choke on scanning with an expression which
requires the reg ex engine to keep track of all scanned characters (for
example FOO*.BAR).

The text extraction is split up into overlapping parts to allow scanning
of very large attachments. In future releases we will add support for
scanning through word, pdf, etc. and compressed files. In those cases
the size of extracted text can be extremely large especially when
scanning through compressed files.

> And what would it take to change the code to do so?

The text extraction is done in the following java class:

mitm.application.djigzo.james.mailets.AbstractRegExpPolicyChecker

See method #serviceMail

Instead of doing a piece by piece scanning, this might be changed to
collecting all text into the textNormalizer and then do the scanning
(policyChecker.update(context)). The changes should be relatively easy.
I can see whether I can make this optional if I have the time.

A better alternative would be to add some post DLP check. All the found
DLP matches are stored in the Mail object. You might add some matcher
which checks whether there are multiple DLP matches.

Kind regards,

Martijn Brinkers

> -----Oorspronkelijk bericht-----
> Van: [email protected] [mailto:[email protected]] 
> Namens Martijn Brinkers
> Verzonden: vrijdag 29 januari 2016 3:15
> Aan: [email protected]
> Onderwerp: Re: [Djigzo users] Combining DLP in subject and body
> 
> 
> 
> On 28-01-16 16:09, Maarten Bout wrote:
>> Hello,
>>
>> I'm wondering if there's a possibility to combine a DLP in the subject, and 
>> a DLP in the body.
>>
>> Encryption should triggering on the words: trigger1, trigger2
>>
>> For example
>> Subject contains: trigger1
>> Body contains: trigger2
>>
>> I'm using the following regexp: trigger1.*(\n|.)*trigger2
>>
>> When the subject or the body contains: trigger1 and trigger2, the email gets 
>> encrypted.
>> But when the subject contains: trigger1, and the body contains: 
>> trigger2, the email doesn't get encrypted
>>
>> Does anyone have any experience with this situation?
> 
> Unfortunately, matching on multiple message parts is not supported. For 
> performance and memory reasons, a message is scanned part by part and the 
> headers of the message are considered to be a separate part. So if you have a 
> multipart message, every part of the message is scanned on it's own. In 
> principle it should be possible to modify the code to combine all parts into 
> one large part and scan the complete text. This however makes scanning slower 
> and require more memory.
> 
> What is the kind of DLP scanning that you want to accomplish? Only DLP scan 
> if the subject contains some string?
> 
> Kind regards,
> 
> Martijn Brinkers
> 
> --
> CipherMail email encryption
> 
> Email encryption with support for S/MIME, OpenPGP, PDF encryption and secure 
> webmail pull.
> 
> https://www.ciphermail.com
> 
> Twitter: http://twitter.com/CipherMail
> 
> _______________________________________________
> Users mailing list
> [email protected]
> https://lists.djigzo.com/lists/listinfo/users
> _______________________________________________
> Users mailing list
> [email protected]
> https://lists.djigzo.com/lists/listinfo/users
> 

-- 
CipherMail email encryption

Email encryption with support for S/MIME, OpenPGP, PDF encryption and
secure webmail pull.

https://www.ciphermail.com

Twitter: http://twitter.com/CipherMail

_______________________________________________
Users mailing list
[email protected]
https://lists.djigzo.com/lists/listinfo/users

Reply via email to