[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356487#comment-17356487 ] Fabio Germann commented on LUCENE-9379: --- Thanks [~broustant]/[~bruno.roustant], this is also something that I was looking for! As for [~rcmuir]'s comment(s): I think the important distinction to be made is the goal of the usage of encryption and the guarantees you need. If one needs tenant based encryption at rest, os level encryption is a valid way to go. Also if one needs maximum performance and tries to squeeze every last drop of performance out of their NVMe's - os level encryption (or no encryption) would probably be best. BUT: In todays world there are sometimes things that are more important (or pose a greater risk) to a project or a company: namely user privacy and data protection. In such cases decreased performance is certainly acceptable (if not already anticipated). Many of the above arguments against this contribution can be addressed one way or another. What can NOT be addressed (and why [~bruno.roustant]'s contribution is valuable) is: * It allows for the stored content to only be accessible to Lucene (the process/thread), for the exact duration that Lucene needs to process the data, without any dependency on a downstream component. * It allows for platform interoperability/independence. (Example:) This allows the solution to be deployed to Linux system, while being developed on MacOS/Windows. (Sidenote: This is very important if there are large teams working on solution building on this.) * It can even offer protection from passive privileged users - meaning that the file on the filesystem is not readable for a privileged user. In contrast to that the os-level encryption that would make such protections more complex. * It allows for simple deployment in container technologies (which would be tricky with the alternatives proposed by [~rcmuir]) Maybe the increased interest in this topic signals that there is something to be done? Also recent research has taken note - like: (From the abstract:) "[...] However, currently deployed IR technologies, e.g., Apache Lucene - open-source search software, are insufficient when the information is protected or deemed to be private [...]" (Source: [https://www.computer.org/csdl/journal/tq//01/08954811/1gs4XOshKHC)] > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354275#comment-17354275 ] Bruno Roustant commented on LUCENE-9379: _RE AES-XTS vs AES-CTR:_ In the case of Lucene, we produce read-only files per index segment. And if we have a new random IV per file, we don't repeat the same (AES encrypted) blocks. So we are in a safe read-only-once case where AES-XTS and AES-CTR have the same strength [1][2]. Given that CTR is simpler, that's why I chose it for this patch. [1] https://crypto.stackexchange.com/questions/64556/aes-xts-vs-aes-ctr-for-write-once-storage [2] https://crypto.stackexchange.com/questions/14628/why-do-we-use-xts-over-ctr-for-disk-encryption > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353770#comment-17353770 ] David Smiley commented on LUCENE-9379: -- Rob, please tone down your language. Don't speak of how much others are "uneducated"; merely point to what you want to show to help others understand your point of view. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353696#comment-17353696 ] Robert Muir commented on LUCENE-9379: - Your argument is even more uneducated, the "i can do better than encryption at rest" argument. Get out of town! Lucene depends on the OS page cache for performance. So if you want to encrypt stuff, you need to use the operating system. Also, encrypting storage is non-trivial, and this is a search engine project. Every time someone makes a patch for this issue, its never a standard mode like AES-XTS, it's always some insecure homemade garbage! I'm standing by my decision. Creating more JIRA issues or making more arguments won't help the situation. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353685#comment-17353685 ] Martin Huber commented on LUCENE-9379: -- [~rcmuir] thanks for the useful links. But I didn't say, that per directory or per user encryption would not be possible. This alone is not our usecase. What I said is, that a user with root access to a system can read all files of all users while the users directories are mounted / unlocked. Or he can become the user and then see the files. Is this statement not right ? And such there is no privacy. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353645#comment-17353645 ] Robert Muir commented on LUCENE-9379: - As always, you can count on arch to have some good user-level wiki docs on how to do this: https://wiki.archlinux.org/title/Fscrypt > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353644#comment-17353644 ] Robert Muir commented on LUCENE-9379: - Sorry, the above comment is really wrong. Please see my comments on linked issues. You can definitely manage encryption at multiple levels in the os: * block level * filesystem level Please understand the options available and be educated about this, see: https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html This FS-level crypto subsystem is usable with e.g. ext4 and f2fs filesystems, among others. So you can definitely do different stuff per-directory, which makes multitenant use-cases easily possible (and from my understanding, was the intent of the changes in the first place) I won't drop my {{-1}} vote on this because folks won't read the documentation for their operating system. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352508#comment-17352508 ] Martin Huber commented on LUCENE-9379: -- [~broustant] - one very valid use case that is not solvable by means of OS encryption is if you want to ensure per index encryption preserving zero knowledge privacy of documents and the search index. OS level encryption, as far as I know, always allows file access to admins with local access to the filesystem as long as the encrypted volume is mounted. This only can be overcome with in-memory en/decryption. So +1 for what you did ! (y) > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319836#comment-17319836 ] Ming Zhang commented on LUCENE-9379: [~bruno.roustant] In our case, we have dedicated collection for each tenant. Because it has so many tenants that it's not possible to serve them in single solr clsuter, we have multiple clusters. It has to have different encryption key for each collection as well. It looks this directory(tenant) based approach is able address our requirement. Looking forward to getting this enhancement soon. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175699#comment-17175699 ] Rajeswari Natarajan commented on LUCENE-9379: - [~bruno.roustant] and [~dsmiley] , if we go with implicit router, shard management/rebalancing/routing becomes manual. Solrcloud will not take care of these (In solr mailing lists always I see users are advised against taking this route_ , so looking to see if encryption possible with composite id router and multiple tenants per collection . We might have around 3000+ collections going forward , so having one collection per tenant will make our cluster really heavy. Please share your thoughts and if anyone has attempted this kind of encryption > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175583#comment-17175583 ] Bruno Roustant commented on LUCENE-9379: [~Raji] maybe a better approach would be to have one tenant per collection, but you might have many tenants so the performance for many collection is poor? If this is the case, then I think the root problem is the perf for many collections. Without composite id router you could use an OS encryption per collection. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173265#comment-17173265 ] David Smiley commented on LUCENE-9379: -- Rajeswari -- you are referring to some SolrCloud concepts. The scenario you describe would _often_ co-locate your "tenants", and thus any OS or Lucene Directory or Codec levels simply +won't work+. For example if you had a field "name" that's indexed, then it's an index for all docs in that index, spanning your multiple "tenants". Instead, you could either create separate Collections, or have one Collection with "implicit" (really explicit) shard creation/naming for each tenant, but you'd have to be careful in all you do to query/index a specific shard instead of accidentally querying the whole. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172622#comment-17172622 ] Rajeswari Natarajan commented on LUCENE-9379: - We have a use case where we want to fit multiple index/tenant per collection and each index/tenant should have a separate key and we would like to use composite ID router. The use of composite id router do not limit each index/tenant per shard/directory . In this scenario , is OS level encryption possible? > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > +Important+: This Lucene Directory wrapper approach is to be considered only > if an OS level encryption is not possible. OS level encryption better fits > Lucene usage of OS cache, and thus is more performant. > But there are some use-case where OS level encryption is not possible. This > Jira issue was created to address those. > > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156566#comment-17156566 ] Bruno Roustant commented on LUCENE-9379: I'm going to pause my work on this for some time, until there are comments added here that share use-cases where OS level encryption is not possible. If you can use OS level encryption, do so, it will be faster. If not, share your use-case here. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152052#comment-17152052 ] Uwe Schindler commented on LUCENE-9379: --- How about the Solr Block Cache used for HDFS? It could be moved to Lucene (as HDFS is going away anyways). > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152049#comment-17152049 ] David Smiley commented on LUCENE-9379: -- I'm glad you remembered on-heap FST. Another option to improve performance more is a Java heap level cache. It could be added later and layered above this Directory (without being intertwined with this issue/code) if deemed worthwhile. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151920#comment-17151920 ] Bruno Roustant commented on LUCENE-9379: I tested with FST ON-HEAP: we gain +15% to +20% perf on all queries. I tested my Light version of javax.crypto.Cipher. It is indeed much faster for construction and cloning, but not for the core encryption. The reason is that two internal classes in com.sun.crypto have an @HotSpotIntrinsicCandidate annotation that makes the encryption extremely fast. I tested with a hack version that takes the best of the two versions. It brings a cumulative +10% perf improvement. So as a conclusion for the perf benchmark: * An OS level encryption is best and fastest. * If really it’s not possible, expect an average of -20% perf impact on most queries, -60% on multiterm queries. * If you need more you can make FST on-heap and expect +15% perf. * If you need more you can use a Cipher hack to get +10% perf. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151918#comment-17151918 ] Bruno Roustant commented on LUCENE-9379: TaskQPS Lucene86 StdDevQPS EncryptionTim StdDev Pct diff Respell 41.55 (2.7%) 10.76 (0.9%) -74.1% ( -75% - -72%) Fuzzy2 44.81 (9.0%) 12.00 (1.1%) -73.2% ( -76% - -69%) Fuzzy1 41.03 (7.3%) 16.24 (1.9%) -60.4% ( -64% - -55%) Wildcard 28.02 (4.0%) 14.94 (2.0%) -46.7% ( -50% - -42%) OrHighNotLow 747.43 (4.2%) 485.90 (3.5%) -35.0% ( -40% - -28%) OrNotHighMed 524.60 (4.2%) 344.06 (2.9%) -34.4% ( -39% - -28%) OrHighNotHigh 576.32 (5.0%) 382.60 (4.0%) -33.6% ( -40% - -25%) OrHighNotMed 553.85 (4.1%) 371.73 (3.4%) -32.9% ( -38% - -26%) MedTerm 1116.53 (3.6%) 766.39 (2.6%) -31.4% ( -36% - -26%) LowTerm 1376.31 (4.2%) 947.48 (3.0%) -31.2% ( -36% - -25%) OrNotHighLow 492.68 (4.7%) 342.05 (4.7%) -30.6% ( -38% - -22%) AndHighLow 482.97 (3.8%) 342.18 (3.4%) -29.2% ( -34% - -22%) OrHighLow 410.23 (3.7%) 294.38 (3.8%) -28.2% ( -34% - -21%) HighTerm 971.63 (5.3%) 701.77 (3.2%) -27.8% ( -34% - -20%) OrNotHighHigh 493.99 (5.1%) 358.95 (3.9%) -27.3% ( -34% - -19%) LowPhrase 286.03 (2.9%) 246.04 (2.8%) -14.0% ( -19% - -8%) HighPhrase 290.25 (3.3%) 252.54 (3.4%) -13.0% ( -18% - -6%) Prefix3 51.36 (4.8%) 45.20 (4.1%) -12.0% ( -19% - -3%) AndHighMed 113.34 (4.0%) 105.77 (4.0%) -6.7% ( -14% - 1%) MedSloppyPhrase 79.83 (3.5%) 74.78 (3.6%) -6.3% ( -13% - 0%) HighTermDayOfYearSort 63.32 (13.3%) 59.34 (14.6%) -6.3% ( -30% - 24%) HighTermTitleBDVSort 86.16 (10.3%) 81.63 (10.0%) -5.3% ( -23% - 16%) LowSpanNear 58.07 (3.1%) 55.13 (3.2%) -5.1% ( -10% - 1%) AndHighHigh 44.58 (4.1%) 42.92 (4.2%) -3.7% ( -11% - 4%) OrHighMed 56.53 (4.4%) 54.65 (4.1%) -3.3% ( -11% - 5%) BrowseDateTaxoFacets 1.54 (4.6%) 1.50 (5.2%) -2.5% ( -11% - 7%) HighTermMonthSort 18.51 (10.5%) 18.06 (10.1%) -2.4% ( -20% - 20%) BrowseDayOfYearTaxoFacets 1.53 (4.7%) 1.49 (5.3%) -2.3% ( -11% - 8%) BrowseMonthTaxoFacets 1.77 (3.5%) 1.74 (4.2%) -2.1% ( -9% - 5%) HighSpanNear 12.75 (3.6%) 12.50 (4.1%) -2.0% ( -9% - 5%) MedPhrase 107.89 (3.2%) 106.01 (3.9%) -1.7% ( -8% - 5%) HighSloppyPhrase 12.86 (4.0%) 12.71 (4.7%) -1.2% ( -9% - 7%) MedSpanNear 11.76 (3.1%) 11.62 (3.4%) -1.1% ( -7% - 5%) HighIntervalsOrdered 13.61 (3.2%) 13.46 (3.3%) -1.1% ( -7% - 5%) OrHighHigh 11.12 (3.7%) 11.12 (4.1%) -0.1% ( -7% - 8%) BrowseMonthSSDVFacets 4.28 (3.9%) 4.29 (3.9%) 0.2% ( -7% - 8%) BrowseDayOfYearSSDVFacets 3.82 (3.7%) 3.84 (3.4%) 0.3% ( -6% - 7%) IntNRQ 25.54 (3.1%) 26.34 (3.4%) 3.1% ( -3% - 9%) PKLookup 174.98 (3.0%) 183.78 (4.5%) 5.0% ( -2% - 12%) LowSloppyPhrase 6.29 (3.5%) 6.89 (4.5%) 9.6% ( 1% - 18%) > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151915#comment-17151915 ] Bruno Roustant commented on LUCENE-9379: I ran the benchmarks to measure the perf impact of this IndexInput-level encryption on the PostingsFormat (luceneutil on wikimediumall). When encrypting only the terms file, FST file and metadata file (.tim .tip .tmd) (not doc id nor postings): Most queries run between -0% to -35% Wildcard -47% Fuzzy/Respell between -60% to -74% It is possible to encrypt all files, but the perf drops considerably, -60% for most queries, -90% for fuzzy queries. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151914#comment-17151914 ] Bruno Roustant commented on LUCENE-9379: [~rcmuir] makes an important callout in the PR. A better approach is by leveraging the OS encryption at filesystem level because it fits the OS filesystem cache. That way the cached pages are decrypted in the cache. So whenever it is possible, we must use OS level encryption. An OS filesystem encryption allows to encrypt differently per directory/file, and some allow to manage multiple keys. But OS level encryption is not always possible. The example I can think of is running on computing engines on public cloud. In this case we don't have access to the OS level encryption (there is one but we cannot manage keys). So this Jira issue propose a solution in the case we cannot use OS level encryption and we need to manage multiple keys. It should be stated well in the doc/javadoc. It is sub-optimal because it has to decrypt each time it accesses a cached IO page. So expect more performance impact. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149694#comment-17149694 ] Bruno Roustant commented on LUCENE-9379: Watchers, I need your help. I need to know how you would use the encryption, and more precisely how you would provide the keys. Is my approach of using either an EncryptingDirectory (in the PR look at SimpleEncryptingDirectory) or a custom Codec (in the PR look at EncryptingCodec) appropriate for your use-case? Note that both SimpleEncryptingDirectory and EncryptingCodec are only in test packages as I expect the users to write some custom code to use encryption. If you have an idea of a standard code that could be added to make encryption easy, please share your idea here. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149691#comment-17149691 ] Bruno Roustant commented on LUCENE-9379: I updated the PR. Now it is functional and complete, with javadoc. There should be no perf issue anymore because I replaced javax.crypto.Cipher by a much lighter code that is strictly equivalent, encryption/decryption is the same (tested randomly by 3 different tests). For reviewers, there are 33 changed files in the PR but only 10 source classes, the other are for tests. Look for the classes in store package (e.g. EncryptingDirectory, EncryptingIndexOutput, EncryptingIndexInput) and the new util.crypto package (e.g. AesCtrEncrypter). Now all tests pass when enabling the encryption with a test codec or a test directory. Next step: * Run luceneutil benchmark to evaluate the perf impact. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143886#comment-17143886 ] Bruno Roustant commented on LUCENE-9379: First PR, functional but incomplete. The idea of using a pool of Cipher does not work in Lucene. To run the tests, two options: test -Dtests.codec=Encrypting It executes the tests with the EncryptingCodec in test-framework. Currently it encrypts a delegate PostingsFormat. This option shows how to provide the encryption key depending on the SegmentInfo. test -Dtests.directory=org.apache.lucene.codecs.encrypting.SimpleEncryptingDirectory It executes the tests with the SimpleEncryptingDirectory in test-framework. This option is the simplest; it shows how to provide the encryption key as a constant (could be a property) or only depending on the name of the file to encrypt (no SegmentInfo). There is a performance issue because of too many new Ciphers when slicing IndexInput. javax.crypto.Cipher is heavy weight to create and is stateful. I tried a CipherPool, but actually there are many cases where we need to get lots of slices of the IndexInput so we have to create lots of new stateful Cipher. The pool turns out to be a no-go, there are too many Cipher in it. TODO: * find a lighter alternative to Cipher if it exists. * fix a couple of tests still failing because of unclosed IndexOutput. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption
[ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136783#comment-17136783 ] Bruno Roustant commented on LUCENE-9379: So I plan to implement an EncryptingDirectory extending FilterDirectory. +Encryption method:+ AES CTR (counter) * This mode is approved by NIST. ([https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29]) * AES encryption has the same size as the original clear text (though the last block is padded to 128 bits). So we can use the same file pointers. * CTR mode allows random access to encrypted blocks (128 bits blocks). * IV (initialisation vector) must be random, and is stored at the beginning of the encrypted file because it can be public. * It is appropriate to encrypt streams. +API:+ I don’t anticipate any API change. +How to provide encryption keys:+ EncryptingDirectory would require a delegate Directory, an encryption key supplier, and a Cipher pool (for performance). For the callers to pass the encryption keys, I see two ways: 1- In Solr, declare a DirectoryFactory in solrconfig.xml that creates EncryptingDirectory. This factory is able to determine the encryption key per file based on the path. It is the responsibility of this factory to access the keys (e.g. stored in safe DB, received with an admin handler, read from properties, etc). The Cipher pool is hold by the DirectoryFactory. 2- More generally the EncryptingDirectory can be created to wrap a Directory when opening a segment (e.g. in PostingsFormat/DocValuesFormat fieldsConsumer()/fieldsProducer(), in StoredFieldFormat fieldsReader()/fieldsWriter(), etc). In this case the PostingsFormat/DocValuesFormat/StoredFieldFormat extension determines the encryption key based on the SegmentInfo. A custom Codec can be created to handle encrypting formats. The Cipher pool is hold either in the Codec or in the Format. +Code:+ I will inspire from Apache commons-crypto CtrCryptoOutputStream, although not directly using it because it is an OutputStream while we need an IndexOutput. And we can probably simplify since we have a specific use-case compared to this lib wide usage. > Directory based approach for index encryption > - > > Key: LUCENE-9379 > URL: https://issues.apache.org/jira/browse/LUCENE-9379 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > > The goal is to provide optional encryption of the index, with a scope limited > to an encryptable Lucene Directory wrapper. > Encryption is at rest on disk, not in memory. > This simple approach should fit any Codec as it would be orthogonal, without > modifying APIs as much as possible. > Use a standard encryption method. Limit perf/memory impact as much as > possible. > Determine how callers provide encryption keys. They must not be stored on > disk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org