[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2021-06-03 Thread Fabio Germann (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356487#comment-17356487
 ] 

Fabio Germann commented on LUCENE-9379:
---

Thanks [~broustant]/[~bruno.roustant], this is also something that I was 
looking for!

As for [~rcmuir]'s comment(s): I think the important distinction to be made is 
the goal of the usage of encryption and the guarantees you need.

If one needs tenant based encryption at rest, os level encryption is a valid 
way to go. Also if one needs maximum performance and tries to squeeze every 
last drop of performance out of their NVMe's - os level encryption (or no 
encryption) would probably be best.

BUT: In todays world there are sometimes things that are more important (or 
pose a greater risk) to a project or a company: namely user privacy and data 
protection. In such cases decreased performance is certainly acceptable (if not 
already anticipated).

Many of the above arguments against this contribution can be addressed one way 
or another. What can NOT be addressed (and why [~bruno.roustant]'s contribution 
is valuable) is:
 * It allows for the stored content to only be accessible to Lucene (the 
process/thread), for the exact duration that Lucene needs to process the data, 
without any dependency on a downstream component.
 * It allows for platform interoperability/independence. (Example:) This allows 
the solution to be deployed to Linux system, while being developed on 
MacOS/Windows. (Sidenote: This is very important if there are large teams 
working on solution building on this.)
 * It can even offer protection from passive privileged users - meaning that 
the file on the filesystem is not readable for a privileged user. In contrast 
to that the os-level encryption that would make such protections more complex.
 * It allows for simple deployment in container technologies (which would be 
tricky with the alternatives proposed by [~rcmuir])

 

Maybe the increased interest in this topic signals that there is something to 
be done?

Also recent research has taken note - like: 
(From the abstract:) "[...] However, currently deployed IR technologies, e.g., 
Apache Lucene - open-source search software, are insufficient when the 
information is protected or deemed to be private [...]"
(Source: 
[https://www.computer.org/csdl/journal/tq//01/08954811/1gs4XOshKHC)] 

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2021-05-31 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354275#comment-17354275
 ] 

Bruno Roustant commented on LUCENE-9379:


_RE AES-XTS vs AES-CTR:_
In the case of Lucene, we produce read-only files per index segment. And if we 
have a new random IV per file, we don't repeat the same (AES encrypted) blocks. 
So we are in a safe read-only-once case where AES-XTS and AES-CTR have the same 
strength [1][2]. Given that CTR is simpler, that's why I chose it for this 
patch.

[1] 
https://crypto.stackexchange.com/questions/64556/aes-xts-vs-aes-ctr-for-write-once-storage
[2] 
https://crypto.stackexchange.com/questions/14628/why-do-we-use-xts-over-ctr-for-disk-encryption

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2021-05-29 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353770#comment-17353770
 ] 

David Smiley commented on LUCENE-9379:
--

Rob, please tone down your language.  Don't speak of how much others are 
"uneducated"; merely point to what you want to show to help others understand 
your point of view.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2021-05-29 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353696#comment-17353696
 ] 

Robert Muir commented on LUCENE-9379:
-

Your argument is even more uneducated, the "i can do better than encryption at 
rest" argument. Get out of town!

Lucene depends on the OS page cache for performance. So if you want to encrypt 
stuff, you need to use the operating system.
Also, encrypting storage is non-trivial, and this is a search engine project.
Every time someone makes a patch for this issue, its never a standard mode like 
AES-XTS, it's always some insecure homemade garbage!

I'm standing by my decision. Creating more JIRA issues or making more arguments 
won't help the situation.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2021-05-29 Thread Martin Huber (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353685#comment-17353685
 ] 

Martin Huber commented on LUCENE-9379:
--

[~rcmuir] thanks for the useful links. 

But I didn't say, that per directory or per user encryption would not be 
possible. This alone is not our usecase.

What I said is, that a user with root access to a system can read all files of 
all users while the users directories are mounted / unlocked. Or he can become 
the user and then see the files.

Is this statement not right ?

And such there is no privacy. 

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2021-05-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353645#comment-17353645
 ] 

Robert Muir commented on LUCENE-9379:
-

As always, you can count on arch to have some good user-level wiki docs on how 
to do this: https://wiki.archlinux.org/title/Fscrypt


> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2021-05-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353644#comment-17353644
 ] 

Robert Muir commented on LUCENE-9379:
-

Sorry, the above comment is really wrong. Please see my comments on linked 
issues.

You can definitely manage encryption at multiple levels in the os:
* block level
* filesystem level

Please understand the options available and be educated about this, see: 
https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html
This FS-level crypto subsystem is usable with e.g. ext4 and f2fs filesystems, 
among others. So you can definitely do different stuff per-directory, which 
makes multitenant use-cases easily possible (and from my understanding, was the 
intent of the changes in the first place)

I won't drop my {{-1}} vote on this because folks won't read the documentation 
for their operating system.


> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2021-05-27 Thread Martin Huber (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352508#comment-17352508
 ] 

Martin Huber commented on LUCENE-9379:
--

[~broustant]  - one very valid use case that is not solvable by means of OS 
encryption is if you want to ensure per index encryption preserving zero 
knowledge privacy of documents and the search index. OS level encryption, as 
far as I know, always allows file access to admins with local access to the 
filesystem as long as the encrypted volume is mounted. This only can be 
overcome with in-memory en/decryption.

So +1 for what you did ! (y)

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2021-04-12 Thread Ming Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319836#comment-17319836
 ] 

Ming Zhang commented on LUCENE-9379:


[~bruno.roustant] In our case, we have dedicated collection for each tenant. 
Because it has so many tenants that it's not possible to serve them in single 
solr clsuter, we have multiple clusters. It has to have different encryption 
key for each collection as well. It looks this directory(tenant) based approach 
is able address our requirement. Looking forward to getting this enhancement 
soon.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-08-11 Thread Rajeswari Natarajan (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175699#comment-17175699
 ] 

Rajeswari Natarajan commented on LUCENE-9379:
-

[~bruno.roustant] and [~dsmiley] , if we go with implicit  router, shard 
management/rebalancing/routing becomes manual. Solrcloud will not take care of 
these (In solr mailing lists always I see users are advised against taking this 
route_ , so looking to see if encryption possible with composite id router and 
multiple tenants per collection . We might have around 3000+ collections going 
forward  , so having one collection per tenant will make our cluster really 
heavy.  Please share your thoughts and if anyone has attempted this kind of 
encryption

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-08-11 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175583#comment-17175583
 ] 

Bruno Roustant commented on LUCENE-9379:


[~Raji] maybe a better approach would be to have one tenant per collection, but 
you might have many tenants so the performance for many collection is poor? If 
this is the case, then I think the root problem is the perf for many 
collections. Without composite id router you could use an OS encryption per 
collection.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-08-07 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173265#comment-17173265
 ] 

David Smiley commented on LUCENE-9379:
--

Rajeswari -- you are referring to some SolrCloud concepts.  The scenario you 
describe would _often_ co-locate your "tenants", and thus any OS or Lucene 
Directory or Codec levels simply +won't work+.  For example if you had a field 
"name" that's indexed, then it's an index for all docs in that index, spanning 
your multiple "tenants".  Instead, you could either create separate 
Collections, or have one Collection with "implicit" (really explicit) shard 
creation/naming for each tenant, but you'd have to be careful in all you do to 
query/index a specific shard instead of accidentally querying the whole.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-08-06 Thread Rajeswari Natarajan (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172622#comment-17172622
 ] 

Rajeswari Natarajan commented on LUCENE-9379:
-

We have a use case where we want to fit multiple index/tenant per collection 
and each index/tenant should have a separate key and we would like to use 
composite ID router. The use of composite id router do not limit each 
index/tenant per shard/directory . In this scenario , is OS level encryption 
possible?

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-07-13 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156566#comment-17156566
 ] 

Bruno Roustant commented on LUCENE-9379:


I'm going to pause my work on this for some time, until there are comments 
added here that share use-cases where OS level encryption is not possible.
If you can use OS level encryption, do so, it will be faster. If not, share 
your use-case here.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-07-06 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152052#comment-17152052
 ] 

Uwe Schindler commented on LUCENE-9379:
---

How about the Solr Block Cache used for HDFS? It could be moved to Lucene (as 
HDFS is going away anyways).

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-07-06 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152049#comment-17152049
 ] 

David Smiley commented on LUCENE-9379:
--

I'm glad you remembered on-heap FST.

Another option to improve performance more is a Java heap level cache.  It 
could be added later and layered above this Directory (without being 
intertwined with this issue/code) if deemed worthwhile.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-07-06 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151920#comment-17151920
 ] 

Bruno Roustant commented on LUCENE-9379:


I tested with FST ON-HEAP: we gain +15% to +20% perf on all queries.

I tested my Light version of javax.crypto.Cipher. It is indeed much faster for 
construction and cloning, but not for the core encryption. The reason is that 
two internal classes in com.sun.crypto have an @HotSpotIntrinsicCandidate 
annotation that makes the encryption extremely fast.

I tested with a hack version that takes the best of the two versions. It brings 
a cumulative +10% perf improvement.

So as a conclusion for the perf benchmark:
 * An OS level encryption is best and fastest.
 * If really it’s not possible, expect an average of -20% perf impact on most 
queries, -60% on multiterm queries.
 * If you need more you can make FST on-heap and expect +15% perf.
 * If you need more you can use a Cipher hack to get +10% perf.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-07-06 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151918#comment-17151918
 ] 

Bruno Roustant commented on LUCENE-9379:


TaskQPS Lucene86 StdDevQPS EncryptionTim StdDev Pct diff
 Respell 41.55 (2.7%) 10.76 (0.9%) -74.1% ( -75% - -72%)
 Fuzzy2 44.81 (9.0%) 12.00 (1.1%) -73.2% ( -76% - -69%)
 Fuzzy1 41.03 (7.3%) 16.24 (1.9%) -60.4% ( -64% - -55%)
 Wildcard 28.02 (4.0%) 14.94 (2.0%) -46.7% ( -50% - -42%)
 OrHighNotLow 747.43 (4.2%) 485.90 (3.5%) -35.0% ( -40% - -28%)
 OrNotHighMed 524.60 (4.2%) 344.06 (2.9%) -34.4% ( -39% - -28%)
 OrHighNotHigh 576.32 (5.0%) 382.60 (4.0%) -33.6% ( -40% - -25%)
 OrHighNotMed 553.85 (4.1%) 371.73 (3.4%) -32.9% ( -38% - -26%)
 MedTerm 1116.53 (3.6%) 766.39 (2.6%) -31.4% ( -36% - -26%)
 LowTerm 1376.31 (4.2%) 947.48 (3.0%) -31.2% ( -36% - -25%)
 OrNotHighLow 492.68 (4.7%) 342.05 (4.7%) -30.6% ( -38% - -22%)
 AndHighLow 482.97 (3.8%) 342.18 (3.4%) -29.2% ( -34% - -22%)
 OrHighLow 410.23 (3.7%) 294.38 (3.8%) -28.2% ( -34% - -21%)
 HighTerm 971.63 (5.3%) 701.77 (3.2%) -27.8% ( -34% - -20%)
 OrNotHighHigh 493.99 (5.1%) 358.95 (3.9%) -27.3% ( -34% - -19%)
 LowPhrase 286.03 (2.9%) 246.04 (2.8%) -14.0% ( -19% - -8%)
 HighPhrase 290.25 (3.3%) 252.54 (3.4%) -13.0% ( -18% - -6%)
 Prefix3 51.36 (4.8%) 45.20 (4.1%) -12.0% ( -19% - -3%)
 AndHighMed 113.34 (4.0%) 105.77 (4.0%) -6.7% ( -14% - 1%)
 MedSloppyPhrase 79.83 (3.5%) 74.78 (3.6%) -6.3% ( -13% - 0%)
 HighTermDayOfYearSort 63.32 (13.3%) 59.34 (14.6%) -6.3% ( -30% - 24%)
 HighTermTitleBDVSort 86.16 (10.3%) 81.63 (10.0%) -5.3% ( -23% - 16%)
 LowSpanNear 58.07 (3.1%) 55.13 (3.2%) -5.1% ( -10% - 1%)
 AndHighHigh 44.58 (4.1%) 42.92 (4.2%) -3.7% ( -11% - 4%)
 OrHighMed 56.53 (4.4%) 54.65 (4.1%) -3.3% ( -11% - 5%)
 BrowseDateTaxoFacets 1.54 (4.6%) 1.50 (5.2%) -2.5% ( -11% - 7%)
 HighTermMonthSort 18.51 (10.5%) 18.06 (10.1%) -2.4% ( -20% - 20%)
BrowseDayOfYearTaxoFacets 1.53 (4.7%) 1.49 (5.3%) -2.3% ( -11% - 8%)
 BrowseMonthTaxoFacets 1.77 (3.5%) 1.74 (4.2%) -2.1% ( -9% - 5%)
 HighSpanNear 12.75 (3.6%) 12.50 (4.1%) -2.0% ( -9% - 5%)
 MedPhrase 107.89 (3.2%) 106.01 (3.9%) -1.7% ( -8% - 5%)
 HighSloppyPhrase 12.86 (4.0%) 12.71 (4.7%) -1.2% ( -9% - 7%)
 MedSpanNear 11.76 (3.1%) 11.62 (3.4%) -1.1% ( -7% - 5%)
 HighIntervalsOrdered 13.61 (3.2%) 13.46 (3.3%) -1.1% ( -7% - 5%)
 OrHighHigh 11.12 (3.7%) 11.12 (4.1%) -0.1% ( -7% - 8%)
 BrowseMonthSSDVFacets 4.28 (3.9%) 4.29 (3.9%) 0.2% ( -7% - 8%)
BrowseDayOfYearSSDVFacets 3.82 (3.7%) 3.84 (3.4%) 0.3% ( -6% - 7%)
 IntNRQ 25.54 (3.1%) 26.34 (3.4%) 3.1% ( -3% - 9%)
 PKLookup 174.98 (3.0%) 183.78 (4.5%) 5.0% ( -2% - 12%)
 LowSloppyPhrase 6.29 (3.5%) 6.89 (4.5%) 9.6% ( 1% - 18%)

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-07-06 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151915#comment-17151915
 ] 

Bruno Roustant commented on LUCENE-9379:


I ran the benchmarks to measure the perf impact of this IndexInput-level 
encryption on the PostingsFormat (luceneutil on wikimediumall).

When encrypting only the terms file, FST file and metadata file (.tim .tip 
.tmd) (not doc id nor postings):
 Most queries run between -0% to -35%
 Wildcard -47%
 Fuzzy/Respell between -60% to -74%

It is possible to encrypt all files, but the perf drops considerably, -60% for 
most queries, -90% for fuzzy queries.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-07-06 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151914#comment-17151914
 ] 

Bruno Roustant commented on LUCENE-9379:


[~rcmuir] makes an important callout in the PR. A better approach is by 
leveraging the OS encryption at filesystem level because it fits the OS 
filesystem cache. That way the cached pages are decrypted in the cache.

So whenever it is possible, we must use OS level encryption. An OS filesystem 
encryption allows to encrypt differently per directory/file, and some allow to 
manage multiple keys.

But OS level encryption is not always possible. The example I can think of is 
running on computing engines on public cloud. In this case we don't have access 
to the OS level encryption (there is one but we cannot manage keys).

So this Jira issue propose a solution in the case we cannot use OS level 
encryption and we need to manage multiple keys. It should be stated well in the 
doc/javadoc. It is sub-optimal because it has to decrypt each time it accesses 
a cached IO page. So expect more performance impact.

 

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-07-01 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149694#comment-17149694
 ] 

Bruno Roustant commented on LUCENE-9379:


Watchers, I need your help.

I need to know how you would use the encryption, and more precisely how you 
would provide the keys.
Is my approach of using either an EncryptingDirectory (in the PR look at 
SimpleEncryptingDirectory) or a custom Codec (in the PR look at 
EncryptingCodec) appropriate for your use-case?

Note that both SimpleEncryptingDirectory and EncryptingCodec are only in test 
packages as I expect the users to write some custom code to use encryption. If 
you have an idea of a standard code that could be added to make encryption 
easy, please share your idea here.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-07-01 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149691#comment-17149691
 ] 

Bruno Roustant commented on LUCENE-9379:


I updated the PR. Now it is functional and complete, with javadoc.

There should be no perf issue anymore because I replaced javax.crypto.Cipher by 
a much lighter code that is strictly equivalent, encryption/decryption is the 
same (tested randomly by 3 different tests).

For reviewers, there are 33 changed files in the PR but only 10 source classes, 
the other are for tests. Look for the classes in store package (e.g. 
EncryptingDirectory, EncryptingIndexOutput, EncryptingIndexInput) and the new 
util.crypto package (e.g. AesCtrEncrypter).

Now all tests pass when enabling the encryption with a test codec or a test 
directory.

Next step:
 * Run luceneutil benchmark to evaluate the perf impact.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-06-24 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143886#comment-17143886
 ] 

Bruno Roustant commented on LUCENE-9379:


First PR, functional but incomplete. The idea of using a pool of Cipher does 
not work in Lucene.

To run the tests, two options:

test -Dtests.codec=Encrypting
It executes the tests with the EncryptingCodec in test-framework. Currently it 
encrypts a delegate PostingsFormat. This option shows how to provide the 
encryption key depending on the SegmentInfo.

test 
-Dtests.directory=org.apache.lucene.codecs.encrypting.SimpleEncryptingDirectory
It executes the tests with the SimpleEncryptingDirectory in test-framework. 
This option is the simplest; it shows how to provide the encryption key as a 
constant (could be a property) or only depending on the name of the file to 
encrypt (no SegmentInfo).

 

There is a performance issue because of too many new Ciphers when slicing 
IndexInput.
javax.crypto.Cipher is heavy weight to create and is stateful. I tried a 
CipherPool, but actually there are many cases where we need to get lots of 
slices of the IndexInput so we have to create lots of new stateful Cipher. The 
pool turns out to be a no-go, there are too many Cipher in it.

TODO:
 * find a lighter alternative to Cipher if it exists.
 * fix a couple of tests still failing because of unclosed IndexOutput.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2020-06-16 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136783#comment-17136783
 ] 

Bruno Roustant commented on LUCENE-9379:


So I plan to implement an EncryptingDirectory extending FilterDirectory.

 

+Encryption method:+

AES CTR (counter)
 * This mode is approved by NIST. 
([https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29])
 * AES encryption has the same size as the original clear text (though the last 
block is padded to 128 bits). So we can use the same file pointers.
 * CTR mode allows random access to encrypted blocks (128 bits blocks).
 * IV (initialisation vector) must be random, and is stored at the beginning of 
the encrypted file because it can be public.
 * It is appropriate to encrypt streams.

 

+API:+ 

I don’t anticipate any API change.

 

+How to provide encryption keys:+

EncryptingDirectory would require a delegate Directory, an encryption key 
supplier, and a Cipher pool (for performance).

For the callers to pass the encryption keys, I see two ways:

1- In Solr, declare a DirectoryFactory in solrconfig.xml that creates 
EncryptingDirectory. This factory is able to determine the encryption key per 
file based on the path. It is the responsibility of this factory to access the 
keys (e.g. stored in safe DB, received with an admin handler, read from 
properties, etc). The Cipher pool is hold by the DirectoryFactory.

2- More generally the EncryptingDirectory can be created to wrap a Directory 
when opening a segment (e.g. in PostingsFormat/DocValuesFormat 
fieldsConsumer()/fieldsProducer(), in StoredFieldFormat 
fieldsReader()/fieldsWriter(), etc). In this case the 
PostingsFormat/DocValuesFormat/StoredFieldFormat extension determines the 
encryption key based on the SegmentInfo. A custom Codec can be created to 
handle encrypting formats. The Cipher pool is hold either in the Codec or in 
the Format.

 

+Code:+

I will inspire from Apache commons-crypto CtrCryptoOutputStream, although not 
directly using it because it is an OutputStream while we need an IndexOutput. 
And we can probably simplify since we have a specific use-case compared to this 
lib wide usage.

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org