[jira] [Comment Edited] (OAK-10674) DocumentStore: verify that we could use Oak's Bloom filter
[ https://issues.apache.org/jira/browse/OAK-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823917#comment-17823917 ] Thomas Mueller edited comment on OAK-10674 at 3/6/24 8:58 AM: -- I can add the method "expectedFpp()" in our code as well (getEstimatedEntryCount we already have), with documentation that this is O ( n ). The implementation is pretty simple: see the Guava implementation here: https://github.com/google/guava/blob/master/guava/src/com/google/common/hash/BloomFilter.java#L190C17-L190C30 Actually I would suggest this method: {noformat} /** * Get the expected false positive rate for the current entries in the filter. * This will first calculate the estimated entry count, and then calculate the false positive probability from there. ... */ public double expectedFpp() { return calculateFpp(getEstimatedEntryCount(), getBitCount(), getK()); } {noformat} was (Author: tmueller): I can add the methods "expectedFpp()" and "approximateElementCount()" in our code as well, with documentation that this is O ( n ). The implementation is pretty simple: see the Guava implementation here: https://github.com/google/guava/blob/master/guava/src/com/google/common/hash/BloomFilter.java#L190C17-L190C30 > DocumentStore: verify that we could use Oak's Bloom filter > -- > > Key: OAK-10674 > URL: https://issues.apache.org/jira/browse/OAK-10674 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Major > > Test that we can use > oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/analysis/utils/BloomFilter.java > (for now, by copying it over). > Then decide about where to move it, and whether API changes are desired. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (OAK-10674) DocumentStore: verify that we could use Oak's Bloom filter
[ https://issues.apache.org/jira/browse/OAK-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823917#comment-17823917 ] Thomas Mueller edited comment on OAK-10674 at 3/6/24 8:52 AM: -- I can add the methods "expectedFpp()" and "approximateElementCount()" in our code as well, with documentation that this is O ( n ). The implementation is pretty simple: see the Guava implementation here: https://github.com/google/guava/blob/master/guava/src/com/google/common/hash/BloomFilter.java#L190C17-L190C30 was (Author: tmueller): I can add the methods "expectedFpp()" and "approximateElementCount()" in our code as well, with documentation that this is O(n). The implementation is pretty simple: see the Guava implementation here: https://github.com/google/guava/blob/master/guava/src/com/google/common/hash/BloomFilter.java#L190C17-L190C30 > DocumentStore: verify that we could use Oak's Bloom filter > -- > > Key: OAK-10674 > URL: https://issues.apache.org/jira/browse/OAK-10674 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Julian Reschke >Assignee: Julian Reschke >Priority: Major > > Test that we can use > oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/analysis/utils/BloomFilter.java > (for now, by copying it over). > Then decide about where to move it, and whether API changes are desired. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (OAK-10674) DocumentStore: verify that we could use Oak's Bloom filter
[ https://issues.apache.org/jira/browse/OAK-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822576#comment-17822576 ] Thomas Mueller edited comment on OAK-10674 at 3/1/24 1:44 PM: -- [~reschke] best would be to move over org.apache.jackrabbit.oak.index.indexer.document.flatfile.analysis.utils.Hash as well. And then we can add a convenience methods: {noformat} /** * Add an entry. This internally uses the hashCode() method to derive a * high-quality hash code. * * @param obj the object (must not be null) */ public void add(@NotNull Object obj) { add(Hash.hash64(obj.hashCode())); } /** * Whether the entry may be in the set. This internally uses the hashCode() * method to derive a high-quality hash code. * * @param obj the object (must not be null) * @return true if the entry was added, or, with a certain false positive * probability, even if it was not added */ public boolean mayContain(@NotNull Object obj) { return mayContain(Hash.hash64(obj.hashCode())); } {noformat} was (Author: tmueller): [~reschke] best would be to move over org.apache.jackrabbit.oak.index.indexer.document.flatfile.analysis.utils.Hash as well. And then we can add a convenience methods: {noformat} /** * Add an entry. This internally uses the hashCode() method to derive a * high-quality hash code. * * @param obj the object (must not be null) */ public void add(@NotNull Object obj) { add(Hash.hash64(obj.hashCode())); } /** * Whether the entry may be in the set. * * @param hash the hash value (need to be a high quality hash code, with all * bits having high entropy) * @return true if the entry was added, or, with a certain false positive * probability, even if it was not added */ public boolean mayContain(@NotNull Object obj) { return mayContain(Hash.hash64(obj.hashCode())); } {noformat} > DocumentStore: verify that we could use Oak's Bloom filter > -- > > Key: OAK-10674 > URL: https://issues.apache.org/jira/browse/OAK-10674 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Julian Reschke >Priority: Major > > Test that we can use > oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/analysis/utils/BloomFilter.java > (for now, by copying it over). > Then decide about where to move it, and whether API changes are desired. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (OAK-10674) DocumentStore: verify that we could use Oak's Bloom filter
[ https://issues.apache.org/jira/browse/OAK-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822576#comment-17822576 ] Thomas Mueller edited comment on OAK-10674 at 3/1/24 1:44 PM: -- [~reschke] best would be to move over org.apache.jackrabbit.oak.index.indexer.document.flatfile.analysis.utils.Hash as well. And then we can add a convenience methods: {noformat} /** * Add an entry. This internally uses the hashCode() method to derive a * high-quality hash code. * * @param obj the object (must not be null) */ public void add(@NotNull Object obj) { add(Hash.hash64(obj.hashCode())); } /** * Whether the entry may be in the set. This internally uses the hashCode() * method to derive a high-quality hash code. * * @param obj the object (must not be null) * @return true if the entry was added, or, with a certain false positive * probability, even if it was not added */ public boolean mayContain(@NotNull Object obj) { return mayContain(Hash.hash64(obj.hashCode())); } {noformat} I can work on this, no issue. We need to also move over some tests. was (Author: tmueller): [~reschke] best would be to move over org.apache.jackrabbit.oak.index.indexer.document.flatfile.analysis.utils.Hash as well. And then we can add a convenience methods: {noformat} /** * Add an entry. This internally uses the hashCode() method to derive a * high-quality hash code. * * @param obj the object (must not be null) */ public void add(@NotNull Object obj) { add(Hash.hash64(obj.hashCode())); } /** * Whether the entry may be in the set. This internally uses the hashCode() * method to derive a high-quality hash code. * * @param obj the object (must not be null) * @return true if the entry was added, or, with a certain false positive * probability, even if it was not added */ public boolean mayContain(@NotNull Object obj) { return mayContain(Hash.hash64(obj.hashCode())); } {noformat} > DocumentStore: verify that we could use Oak's Bloom filter > -- > > Key: OAK-10674 > URL: https://issues.apache.org/jira/browse/OAK-10674 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Julian Reschke >Priority: Major > > Test that we can use > oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/analysis/utils/BloomFilter.java > (for now, by copying it over). > Then decide about where to move it, and whether API changes are desired. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (OAK-10674) DocumentStore: verify that we could use Oak's Bloom filter
[ https://issues.apache.org/jira/browse/OAK-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822576#comment-17822576 ] Thomas Mueller edited comment on OAK-10674 at 3/1/24 1:43 PM: -- [~reschke] best would be to move over org.apache.jackrabbit.oak.index.indexer.document.flatfile.analysis.utils.Hash as well. And then we can add a convenience methods: {noformat} /** * Add an entry. This internally uses the hashCode() method to derive a * high-quality hash code. * * @param obj the object (must not be null) */ public void add(@NotNull Object obj) { add(Hash.hash64(obj.hashCode())); } /** * Whether the entry may be in the set. * * @param hash the hash value (need to be a high quality hash code, with all * bits having high entropy) * @return true if the entry was added, or, with a certain false positive * probability, even if it was not added */ public boolean mayContain(@NotNull Object obj) { return mayContain(Hash.hash64(obj.hashCode())); } {noformat} was (Author: tmueller): [~reschke] best would be to move over org.apache.jackrabbit.oak.index.indexer.document.flatfile.analysis.utils.Hash as well. And then we can add a convenience methods: {noformat} /** * Add an entry. This internally uses the hashCode() method to derive a * high-quality hash code. * * @param obj the object (must not be null) */ public void add(@NotNull Object obj) { add(Hash.hash64(obj.hashCode())); } /** * Whether the entry may be in the set. * * @param hash the hash value (need to be a high quality hash code, with all * bits having high entropy) * @return true if the entry was added, or, with a certain false positive * probability, even if it was not added */ public boolean mayContain(@NotNull Object obj) { return mayContain(obj.hashCode()); } {noformat} > DocumentStore: verify that we could use Oak's Bloom filter > -- > > Key: OAK-10674 > URL: https://issues.apache.org/jira/browse/OAK-10674 > Project: Jackrabbit Oak > Issue Type: Task > Components: documentmk >Reporter: Julian Reschke >Priority: Major > > Test that we can use > oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/analysis/utils/BloomFilter.java > (for now, by copying it over). > Then decide about where to move it, and whether API changes are desired. -- This message was sent by Atlassian Jira (v8.20.10#820010)