Re: New Sub-project Proposal.

2019-09-25 Thread Claude Warren
Created COLLECTIONS-728 ( https://issues.apache.org/jira/browse/COLLECTIONS-728). I hope that was appropriate. On Wed, Sep 25, 2019 at 6:41 PM Gilles Sadowski wrote: > Hello. > > 2019-09-25 19:13 UTC+02:00, Claude Warren : > > There is no associated JIRA report. Not even sure how to generate

Re: New Sub-project Proposal.

2019-09-25 Thread Gilles Sadowski
Hello. 2019-09-25 19:13 UTC+02:00, Claude Warren : > There is no associated JIRA report. Not even sure how to generate or build > one. Issue-tracking system (for "Commons Collections") is there: https://issues.apache.org/jira/projects/COLLECTIONS Gilles > Claude > > On Wed, Sep 25, 2019 at

Re: New Sub-project Proposal.

2019-09-25 Thread Claude Warren
There is no associated JIRA report. Not even sure how to generate or build one. Claude On Wed, Sep 25, 2019 at 5:42 PM Gilles Sadowski wrote: > Hi. > > Is there a JIRA report associated with the proposal? > > It would help review if there were several PRs that > differentiates between "core"

Re: New Sub-project Proposal.

2019-09-25 Thread Gilles Sadowski
Hi. Is there a JIRA report associated with the proposal? It would help review if there were several PRs that differentiates between "core" functionality ("BloomFilter"), with minimal API (vs "syntactic sugar"), and higher-level utilities ("FilterConfiguration", "GatedCollection", etc.).

Re: New Sub-project Proposal.

2019-09-23 Thread sebb
On Mon, 23 Sep 2019 at 20:18, Claude Warren wrote: > > > At first sight, I'd say that serialization is out-of-scope (we > > should let application developers deal with that using the > > available accessors). > > How does one serialize a bloom filter if to do so you need to implement the >

Re: New Sub-project Proposal.

2019-09-23 Thread Claude Warren
> At first sight, I'd say that serialization is out-of-scope (we > should let application developers deal with that using the > available accessors). How does one serialize a bloom filter if to do so you need to implement the private Object writeReplace() method? A list of proto bloom filters is

Re: New Sub-project Proposal.

2019-09-23 Thread Gilles Sadowski
Hi. Le lun. 23 sept. 2019 à 12:59, Claude Warren a écrit : > > I will rework to remove the package private and other access issues noted. > > In Builder there is a difference between with() and build(). This follows > the pattern established by MessageDigest[1] where it is possible to build a >

Re: New Sub-project Proposal.

2019-09-23 Thread Alex Herbert
On 23/09/2019 11:13, Claude Warren wrote: For the style issues is there an Eclipse style package that meets the commons style or some other tool that will correctly configure the format and style options in Eclipse? The Commons style across most projects is loosely based on the Java

Re: New Sub-project Proposal.

2019-09-23 Thread Claude Warren
I will rework to remove the package private and other access issues noted. In Builder there is a difference between with() and build(). This follows the pattern established by MessageDigest[1] where it is possible to build a digest in one call or by adding multiple items and then calling digest.

Re: New Sub-project Proposal.

2019-09-23 Thread Claude Warren
For the style issues is there an Eclipse style package that meets the commons style or some other tool that will correctly configure the format and style options in Eclipse? On Mon, Sep 23, 2019 at 10:54 AM Gilles Sadowski wrote: > Hello. > > Here are a few comment from a quick browse of

Re: New Sub-project Proposal.

2019-09-23 Thread Gilles Sadowski
Hello. Here are a few comment from a quick browse of today's update of PR #83. * "package private for testing" is not a good reason (IMO) * There are spurious blank spaces in some of the file ("git diff" shows them in red) * You should always perform "git rebase master" * In "Builder": ** Field

Re: New Sub-project Proposal.

2019-09-16 Thread Matt Sicker
Due to numerous security flaws in Java serialization, if you do use it, make sure to use serialization proxies and make them as simple as possible. On Sun, 15 Sep 2019 at 20:32, Claude Warren wrote: > > I am refactoring some of the code to make the Builder and Hash classes > enclosed in the

Re: New Sub-project Proposal.

2019-09-15 Thread Claude Warren
I am refactoring some of the code to make the Builder and Hash classes enclosed in the ProtoBloomFilter class. I have also add the proxy serializer as noted in the effective java talk. This simplifies the classes significantly. I will have unit tests and fix the javadocs before the next push.

Re: New Sub-project Proposal.

2019-09-15 Thread Gary Gregory
On Sun, Sep 15, 2019 at 8:17 PM sebb wrote: > On Mon, 16 Sep 2019 at 00:17, Gilles Sadowski > wrote: > > > > Hi. > > > > Le sam. 14 sept. 2019 à 08:15, Claude Warren a écrit > : > > > > > > @Gilles > > > > > > I am happy to rename the package without the plural if that is the > > > standard, I

Re: New Sub-project Proposal.

2019-09-15 Thread sebb
On Mon, 16 Sep 2019 at 00:17, Gilles Sadowski wrote: > > Hi. > > Le sam. 14 sept. 2019 à 08:15, Claude Warren a écrit : > > > > @Gilles > > > > I am happy to rename the package without the plural if that is the > > standard, I will also fix the indent issue. Is there a definition that can > >

Re: New Sub-project Proposal.

2019-09-15 Thread Gilles Sadowski
Hi. Le sam. 14 sept. 2019 à 08:15, Claude Warren a écrit : > > @Gilles > > I am happy to rename the package without the plural if that is the > standard, I will also fix the indent issue. Is there a definition that can > be quickly imported into Eclipse to do the proper formatting? Hopefully,

Re: New Sub-project Proposal.

2019-09-14 Thread Gary Gregory
On Sat, Sep 14, 2019 at 2:15 AM Claude Warren wrote: > @Gilles > > I am happy to rename the package without the plural if that is the > standard, I will also fix the indent issue. Is there a definition that can > be quickly imported into Eclipse to do the proper formatting? > > I am

Re: New Sub-project Proposal.

2019-09-14 Thread Claude Warren
@Gilles I am happy to rename the package without the plural if that is the standard, I will also fix the indent issue. Is there a definition that can be quickly imported into Eclipse to do the proper formatting? I am adding/updating all comments in the code. FilterConfig contains a main method

Re: New Sub-project Proposal.

2019-09-13 Thread Gilles Sadowski
> > [...] > > Gilles, > > Take a look at the PR, I added comments to allow the PR to have 0 deps. > > Gary > > IMO, the package should be named "bloomfilter" (without "s"). Naming seems inconsistent in [Collections]: Some (package) names are singular, other plural. * Indent must be 4 spaces. *

Re: New Sub-project Proposal.

2019-09-13 Thread Gilles Sadowski
Hi. > > [...] > > > > Gilles, > > Take a look at the PR, I added comments to allow the PR to have 0 deps. How about creating a branch on the Apache repository, so that we can commit examples of what needs to be modified in order to comply with the project's style and requirements? Gilles

Re: New Sub-project Proposal.

2019-09-12 Thread Gary Gregory
On Thu, Sep 12, 2019 at 5:24 PM Gilles Sadowski wrote: > Le jeu. 12 sept. 2019 à 20:28, Gary Gregory a > écrit : > > > > On Thu, Sep 12, 2019 at 11:42 AM Gilles Sadowski > > wrote: > > > > > Le jeu. 12 sept. 2019 à 17:32, Claude Warren a > écrit : > > > > > > > > The base code depended on

Re: New Sub-project Proposal.

2019-09-12 Thread Gilles Sadowski
Le jeu. 12 sept. 2019 à 20:28, Gary Gregory a écrit : > > On Thu, Sep 12, 2019 at 11:42 AM Gilles Sadowski > wrote: > > > Le jeu. 12 sept. 2019 à 17:32, Claude Warren a écrit : > > > > > > The base code depended on commons-lang3 for building hashes. Is this > > > acceptable or should the hash

Re: New Sub-project Proposal.

2019-09-12 Thread Gary Gregory
On Thu, Sep 12, 2019 at 11:42 AM Gilles Sadowski wrote: > Le jeu. 12 sept. 2019 à 17:32, Claude Warren a écrit : > > > > The base code depended on commons-lang3 for building hashes. Is this > > acceptable or should the hash generation code from lang3 be cut and > pasted > > into the classes.

Re: New Sub-project Proposal.

2019-09-12 Thread Gilles Sadowski
Le jeu. 12 sept. 2019 à 17:15, Claude Warren a écrit : > > @Gilles > > Missed your suggestion about modularity. Can you point me to the original > message or paraphrase it here? https://markmail.org/message/4bibv2zsibmtyrsg Gilles >> [...]

Re: New Sub-project Proposal.

2019-09-12 Thread Gilles Sadowski
Le jeu. 12 sept. 2019 à 17:32, Claude Warren a écrit : > > The base code depended on commons-lang3 for building hashes. Is this > acceptable or should the hash generation code from lang3 be cut and pasted > into the classes. Not sure what the standard is in this project. There is no

Re: New Sub-project Proposal.

2019-09-12 Thread Gilles Sadowski
Le jeu. 12 sept. 2019 à 17:20, Gary Gregory a écrit : > > Let's talk about modules after the PR comes, I only see that as needed to > avoid bringing in dependencies for all users. IOW I would only see breaking > up Collections into Maven modules if either the PR is giant or it depends > on other

Re: New Sub-project Proposal.

2019-09-12 Thread Claude Warren
The base code depended on commons-lang3 for building hashes. Is this acceptable or should the hash generation code from lang3 be cut and pasted into the classes. Not sure what the standard is in this project. On Thu, Sep 12, 2019 at 4:14 PM Claude Warren wrote: > @Gilles > > Missed your

Re: New Sub-project Proposal.

2019-09-12 Thread Gary Gregory
Let's talk about modules after the PR comes, I only see that as needed to avoid bringing in dependencies for all users. IOW I would only see breaking up Collections into Maven modules if either the PR is giant or it depends on other artifacts. Gary On Thu, Sep 12, 2019, 11:15 Claude Warren

Re: New Sub-project Proposal.

2019-09-12 Thread Claude Warren
@Gilles Missed your suggestion about modularity. Can you point me to the original message or paraphrase it here? Claude On Thu, Sep 12, 2019 at 11:03 AM Gilles Sadowski wrote: > Le jeu. 12 sept. 2019 à 10:28, Stian Soiland-Reyes a > écrit : > > > > On Thu, 12 Sep 2019 08:06:59 +0100, Claude

Re: New Sub-project Proposal.

2019-09-12 Thread Claude Warren
I have no issues with contributing Span and SpanBuffer. Span is similar to commons-lang Range and it might be reasonable to migrate to Range for the Span part. The SpanBuffer (possibly renamed to RangeBuffer) is conceptually a byte buffer with long offset and length so that it can conceptually

Re: New Sub-project Proposal.

2019-09-12 Thread Gilles Sadowski
Le jeu. 12 sept. 2019 à 10:28, Stian Soiland-Reyes a écrit : > > On Thu, 12 Sep 2019 08:06:59 +0100, Claude Warren wrote: > > Actually the code I was thinking of is the multi-filter branch. It cleans > > up some names and simplifies a few things. The collections and storage > > packages might

Re: New Sub-project Proposal.

2019-09-12 Thread Stian Soiland-Reyes
On Thu, 12 Sep 2019 08:06:59 +0100, Claude Warren wrote: > Actually the code I was thinking of is the multi-filter branch. It cleans > up some names and simplifies a few things. The collections and storage > packages might be best added as examples rather than as mainline code. > > In this

Re: New Sub-project Proposal.

2019-09-12 Thread Claude Warren
Actually the code I was thinking of is the multi-filter branch. It cleans up some names and simplifies a few things. The collections and storage packages might be best added as examples rather than as mainline code. In this case we just provide the bloom filter implementation, If we want to

Re: New Sub-project Proposal.

2019-09-11 Thread Gary Gregory
So is the idea to provide wrappers on Sets or a Set implementation? Gary On Wed, Sep 11, 2019 at 3:54 PM Stian Soiland-Reyes wrote: > I certainly got thinking about streams for those methods using the ancy > integrators yes. Commons Collection is already on JDK8, so if that is > sufficient, go

Re: New Sub-project Proposal.

2019-09-11 Thread Gary Gregory
On Wed, Sep 11, 2019 at 11:06 AM Claude Warren wrote: > First it is important to remember that Bloom filters tell you where things > are NOT. Second it is important to understand that Bloom filters can give > false positives but never false negatives. Seems kind of pointless I know > but

Re: New Sub-project Proposal.

2019-09-11 Thread Stian Soiland-Reyes
I certainly got thinking about streams for those methods using the ancy integrators yes. Commons Collection is already on JDK8, so if that is sufficient, go for it! We would need to do IP clearance to bring in the code formally to ASF. It should be easy if it is just you who made it under Apache

Re: New Sub-project Proposal.

2019-09-11 Thread Claude Warren
@stain. You have correctly identified the code in my repository. The code could be refactored to use streams or we could bring the jena iterator extensions into commons. I had suggested that at one time but there were concerns about conflicts with existing code. Duplication with of

Re: New Sub-project Proposal.

2019-09-11 Thread Stian Soiland-Reyes
On Wed, 11 Sep 2019 18:12:24 +0200, Gilles Sadowski wrote: > > The long and short of this is that there is no good unencumbered open > > source library available at the current time. Myself and several others, > > in conversation here at ApacheCon, have expressed interest in creating such > > a

Re: New Sub-project Proposal.

2019-09-11 Thread Gilles Sadowski
Hi. Le mer. 11 sept. 2019 à 17:06, Claude Warren a écrit : > > [...] > > The long and short of this is that there is no good unencumbered open > source library available at the current time. Myself and several others, > in conversation here at ApacheCon, have expressed interest in creating such

Re: New Sub-project Proposal.

2019-09-11 Thread Claude Warren
As another note, we have had discussions here at ApacheCon about developing a method to exchange bloom filter hashing algorithms to make it easier for systems to publish interfaces where bloom filters are passed as the search parameters. Also, bloom filters are good for looking for "and"ed

Re: New Sub-project Proposal.

2019-09-11 Thread Claude Warren
First it is important to remember that Bloom filters tell you where things are NOT. Second it is important to understand that Bloom filters can give false positives but never false negatives. Seems kind of pointless I know but consider the case where you have 10K buckets that may contain the

Re: New Sub-project Proposal.

2019-09-11 Thread sebb
On Wed, 11 Sep 2019 at 12:36, Gary Gregory wrote: > > I would like to know more. I am curious since looking up whether an element > is in a set is done via a hash code. How do you do better than that? Wikipedia has a good explanation: https://en.wikipedia.org/wiki/Bloom_filter Basically

Re: New Sub-project Proposal.

2019-09-11 Thread Gary Gregory
I would like to know more. I am curious since looking up whether an element is in a set is done via a hash code. How do you do better than that? Gary On Tue, Sep 10, 2019, 16:51 Bruno P. Kinoshita wrote: > +1 Collections sounds like a good place for a bloom filter. > > Bruno > > On

Re: New Sub-project Proposal.

2019-09-10 Thread Bruno P. Kinoshita
+1 Collections sounds like a good place for a bloom filter. Bruno On Wednesday, 11 September 2019, 8:00:45 am NZST, Jochen Wiedmann wrote: Hi, Claude, having read, what a bloom filter is, a subproject sounds unnecessary to me. I'd recommend, that you contribute your code to Commons

Re: New Sub-project Proposal.

2019-09-10 Thread Jochen Wiedmann
Hi, Claude, having read, what a bloom filter is, a subproject sounds unnecessary to me. I'd recommend, that you contribute your code to Commons Collections, which seems to me to be a logical target. Jochen On Tue, Sep 10, 2019 at 8:45 PM Claude Warren wrote: > > Having spoken with several

New Sub-project Proposal.

2019-09-10 Thread Claude Warren
Having spoken with several people at ApacheCon, I would like to see a bloomfilter sub project. I have code that is already under Apache License that I am willing to contribute as the basis The goal of the sub-project would be to produce a reference implementation that could be used by other