On 29-01-16 13:18, Maarten Bout wrote: > I want do a DLP scan, if for example: > > Header contains a keyword1 AND the subject (or body) contains keyword2 > > > Can you give an estimation off the performance drop and extra memory, when > scanning through all the parts?
The memory requirements are not the biggest problem. However if the extracted text is very large (several MBs), it might be that the regular expression engine might choke on scanning with an expression which requires the reg ex engine to keep track of all scanned characters (for example FOO*.BAR). The text extraction is split up into overlapping parts to allow scanning of very large attachments. In future releases we will add support for scanning through word, pdf, etc. and compressed files. In those cases the size of extracted text can be extremely large especially when scanning through compressed files. > And what would it take to change the code to do so? The text extraction is done in the following java class: mitm.application.djigzo.james.mailets.AbstractRegExpPolicyChecker See method #serviceMail Instead of doing a piece by piece scanning, this might be changed to collecting all text into the textNormalizer and then do the scanning (policyChecker.update(context)). The changes should be relatively easy. I can see whether I can make this optional if I have the time. A better alternative would be to add some post DLP check. All the found DLP matches are stored in the Mail object. You might add some matcher which checks whether there are multiple DLP matches. Kind regards, Martijn Brinkers > -----Oorspronkelijk bericht----- > Van: [email protected] [mailto:[email protected]] > Namens Martijn Brinkers > Verzonden: vrijdag 29 januari 2016 3:15 > Aan: [email protected] > Onderwerp: Re: [Djigzo users] Combining DLP in subject and body > > > > On 28-01-16 16:09, Maarten Bout wrote: >> Hello, >> >> I'm wondering if there's a possibility to combine a DLP in the subject, and >> a DLP in the body. >> >> Encryption should triggering on the words: trigger1, trigger2 >> >> For example >> Subject contains: trigger1 >> Body contains: trigger2 >> >> I'm using the following regexp: trigger1.*(\n|.)*trigger2 >> >> When the subject or the body contains: trigger1 and trigger2, the email gets >> encrypted. >> But when the subject contains: trigger1, and the body contains: >> trigger2, the email doesn't get encrypted >> >> Does anyone have any experience with this situation? > > Unfortunately, matching on multiple message parts is not supported. For > performance and memory reasons, a message is scanned part by part and the > headers of the message are considered to be a separate part. So if you have a > multipart message, every part of the message is scanned on it's own. In > principle it should be possible to modify the code to combine all parts into > one large part and scan the complete text. This however makes scanning slower > and require more memory. > > What is the kind of DLP scanning that you want to accomplish? Only DLP scan > if the subject contains some string? > > Kind regards, > > Martijn Brinkers > > -- > CipherMail email encryption > > Email encryption with support for S/MIME, OpenPGP, PDF encryption and secure > webmail pull. > > https://www.ciphermail.com > > Twitter: http://twitter.com/CipherMail > > _______________________________________________ > Users mailing list > [email protected] > https://lists.djigzo.com/lists/listinfo/users > _______________________________________________ > Users mailing list > [email protected] > https://lists.djigzo.com/lists/listinfo/users > -- CipherMail email encryption Email encryption with support for S/MIME, OpenPGP, PDF encryption and secure webmail pull. https://www.ciphermail.com Twitter: http://twitter.com/CipherMail _______________________________________________ Users mailing list [email protected] https://lists.djigzo.com/lists/listinfo/users
