Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16864#discussion_r100571446
  
    --- Diff: 
common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java ---
    @@ -148,6 +153,24 @@ int getVersionNumber() {
       public abstract boolean mightContainBinary(byte[] item);
     
       /**
    +   * Returns a new Bloom filter of the union of two Bloom filters.
    +   * Unlike mergeInplace, this will not cause a mutation.
    +   * Callers must ensure the bloom filters are appropriately sized to 
avoid saturating them.
    +   *
    +   * @param other The bloom filter to union this bloom filter with.
    +   * @throws IncompatibleUnionException if {@code isCompatible(other) == 
false}
    +   */
    +  public abstract BloomFilterImpl union(BloomFilter other) throws 
IncompatibleUnionException;
    +
    +  /**
    +   * Swamidass & Baldi (2007) approximation for number of items in the 
intersection of two Bloom filters
    --- End diff --
    
    * Same here. Document the method first and then mention the reference.
    * How is it different from intersecting two bloom filters and then estimate 
the number of items? Union might lead to larger approximation error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to