Re: [Collections] Suppliers, Iterables, and Producers

2024-05-15 Thread Claude Warren
I have updated Collections-854 [1] to reflect the naming that we have been
talking about and will start on the refactoring soon.  Please start
watching that ticket.

Claude

[1]  https://issues.apache.org/jira/browse/COLLECTIONS-854


On Mon, May 13, 2024 at 12:33 PM Claude Warren  wrote:

> A couple of messages back, I proposed a some documentation updates and a
> few names changes:
>
> In thinking about the term Producer, other terms could be used
>> Interrogator (sounds like you can add a query), Extractor might work.  But
>> it has also come to mind that there is a "compute" series of methods in the
>> ConcurrentMap class.  Perhaps the term we want is not "forEach", but
>> "process".  The current form of usage is something like:
>> IndexProducer ip = 
>> ip.forEachIndex(idx -> someIntPredicate)
>> We could change the name from XProducer to XProcessor, or XExtractor; and
>> the method to processXs.  So the above code would look like:
>> IndexExtractor ix = 
>> ix.processIndexs(idx -> someIntPredicate)
>> another example
>> BitMapExtractor bx = .
>> bx.processBitMaps(bitmap -> someBitMapPredicate)
>
>
> So unless there is an objection I will open a ticket to
>
>- rename XProducer to XExtractor
>- rename the XExtractor forEachX method to processX method that takes
>a Predicate argument
>- Add the documentation described previously.
>
> Are we agreed?
> Claude
>
> On Fri, May 3, 2024 at 5:49 PM Gary Gregory 
> wrote:
>
>> LGTM. Maybe the current PR (LGTM) should be merged first, Alex, how does
>> that PR look to you?
>>
>> Gary
>>
>> On Fri, May 3, 2024, 11:44 AM Claude Warren  wrote:
>>
>>> Gary and Alex,
>>>
>>> Any thoughts on this?
>>>
>>> Claude
>>>
>>> On Wed, May 1, 2024 at 7:55 AM Claude Warren  wrote:
>>>
 Good suggestions.

 short-circuit. We could make this distinction by including it in the
> name:
> forEachUntil(Predicate ...), forEachUnless, ...


 We need the unit name in the method name.  All Bloom filters implement
 IndexProducer and BitmapProducer and since they use Predicate method
 parameters they will conflict.


 I have opened a ticket [1] with the list of tasks, which I think is now:

- Be clear that producers are like interruptible iterators with
predicate tests acting as a switch to short-circuit the iteration.
- Rename classes:
   - CellConsumer to CellPredicate (?)
   - Rename BitMap to BitMaps.
- Rename methods:
   - Producer forEachX() to forEachUntil()
   - The semantic nomenclature:
   - Bitmaps are arrays of bits not a BitMaps object.
   - Indexes are ints and not an instance of a Collection object.
   - Cells are pairs of ints representing an index and a value.
   They are not Pair<> objects.
   - Producers iterate over collections of the object (Bitmap,
   Index, Cell) applying a predicate to do work and stop the iteration 
 early
   if necessary.  They are carriers/transporters of Bloom filter enabled
   bits.  They allow us to query the contents of the Bloom filter in an
   implementation agnostic way.


 In thinking about the term Producer, other terms could be used
 Interrogator (sounds like you can add a query), Extractor might work.  But
 it has also come to mind that there is a "compute" series of methods in the
 ConcurrentMap class.  Perhaps the term we want is not "forEach", but
 "process".  The current form of usage is something like:

 IndexProducer ip = 
 ip.forEachIndex(idx -> someIntPredicate)

 We could change the name from XProducer to XProcessor, or XExtractor;
 and the method to processXs.  So the above code would look like:

 IndexExtractor ix = 
 ix.processIndexs(idx -> someIntPredicate)

 another example

 BitMapExtractor bx = .
 bx.processBitMaps(bitmap -> someBitMapPredicate)

 Claude

 [1] https://issues.apache.org/jira/browse/COLLECTIONS-854


 On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory 
 wrote:

>
>
> On 2024/04/30 14:33:47 Alex Herbert wrote:
> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory 
> wrote:
> >
> > > Hi Claude,
> > >
> > > Thank you for the detailed reply :-) A few comments below.
> > >
> > > On 2024/04/30 06:29:38 Claude Warren wrote:
> > > > I will see if I can clarify the javadocs and make things clearer.
> > > >
> > > > What I think I specifically heard is:
> > > >
> > > >- Be clear that producers are fast fail iterators with
> predicate
> > > tests.
> > > >- Rename CellConsumer to CellPredicate (?)
> > >
> > > Agreed (as suggested by Albert)
> > >
> > > >- The semantic nomenclature:
> > > >   - Bitmaps are arrays of bits not a BitMap object.
> > > >   - 

Re: [Collections] Suppliers, Iterables, and Producers

2024-05-13 Thread Claude Warren
A couple of messages back, I proposed a some documentation updates and a
few names changes:

In thinking about the term Producer, other terms could be used Interrogator
> (sounds like you can add a query), Extractor might work.  But it has also
> come to mind that there is a "compute" series of methods in the
> ConcurrentMap class.  Perhaps the term we want is not "forEach", but
> "process".  The current form of usage is something like:
> IndexProducer ip = 
> ip.forEachIndex(idx -> someIntPredicate)
> We could change the name from XProducer to XProcessor, or XExtractor; and
> the method to processXs.  So the above code would look like:
> IndexExtractor ix = 
> ix.processIndexs(idx -> someIntPredicate)
> another example
> BitMapExtractor bx = .
> bx.processBitMaps(bitmap -> someBitMapPredicate)


So unless there is an objection I will open a ticket to

   - rename XProducer to XExtractor
   - rename the XExtractor forEachX method to processX method that takes a
   Predicate argument
   - Add the documentation described previously.

Are we agreed?
Claude

On Fri, May 3, 2024 at 5:49 PM Gary Gregory  wrote:

> LGTM. Maybe the current PR (LGTM) should be merged first, Alex, how does
> that PR look to you?
>
> Gary
>
> On Fri, May 3, 2024, 11:44 AM Claude Warren  wrote:
>
>> Gary and Alex,
>>
>> Any thoughts on this?
>>
>> Claude
>>
>> On Wed, May 1, 2024 at 7:55 AM Claude Warren  wrote:
>>
>>> Good suggestions.
>>>
>>> short-circuit. We could make this distinction by including it in the
 name:
 forEachUntil(Predicate ...), forEachUnless, ...
>>>
>>>
>>> We need the unit name in the method name.  All Bloom filters implement
>>> IndexProducer and BitmapProducer and since they use Predicate method
>>> parameters they will conflict.
>>>
>>>
>>> I have opened a ticket [1] with the list of tasks, which I think is now:
>>>
>>>- Be clear that producers are like interruptible iterators with
>>>predicate tests acting as a switch to short-circuit the iteration.
>>>- Rename classes:
>>>   - CellConsumer to CellPredicate (?)
>>>   - Rename BitMap to BitMaps.
>>>- Rename methods:
>>>   - Producer forEachX() to forEachUntil()
>>>   - The semantic nomenclature:
>>>   - Bitmaps are arrays of bits not a BitMaps object.
>>>   - Indexes are ints and not an instance of a Collection object.
>>>   - Cells are pairs of ints representing an index and a value.
>>>   They are not Pair<> objects.
>>>   - Producers iterate over collections of the object (Bitmap,
>>>   Index, Cell) applying a predicate to do work and stop the iteration 
>>> early
>>>   if necessary.  They are carriers/transporters of Bloom filter enabled
>>>   bits.  They allow us to query the contents of the Bloom filter in an
>>>   implementation agnostic way.
>>>
>>>
>>> In thinking about the term Producer, other terms could be used
>>> Interrogator (sounds like you can add a query), Extractor might work.  But
>>> it has also come to mind that there is a "compute" series of methods in the
>>> ConcurrentMap class.  Perhaps the term we want is not "forEach", but
>>> "process".  The current form of usage is something like:
>>>
>>> IndexProducer ip = 
>>> ip.forEachIndex(idx -> someIntPredicate)
>>>
>>> We could change the name from XProducer to XProcessor, or XExtractor;
>>> and the method to processXs.  So the above code would look like:
>>>
>>> IndexExtractor ix = 
>>> ix.processIndexs(idx -> someIntPredicate)
>>>
>>> another example
>>>
>>> BitMapExtractor bx = .
>>> bx.processBitMaps(bitmap -> someBitMapPredicate)
>>>
>>> Claude
>>>
>>> [1] https://issues.apache.org/jira/browse/COLLECTIONS-854
>>>
>>>
>>> On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory 
>>> wrote:
>>>


 On 2024/04/30 14:33:47 Alex Herbert wrote:
 > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory 
 wrote:
 >
 > > Hi Claude,
 > >
 > > Thank you for the detailed reply :-) A few comments below.
 > >
 > > On 2024/04/30 06:29:38 Claude Warren wrote:
 > > > I will see if I can clarify the javadocs and make things clearer.
 > > >
 > > > What I think I specifically heard is:
 > > >
 > > >- Be clear that producers are fast fail iterators with
 predicate
 > > tests.
 > > >- Rename CellConsumer to CellPredicate (?)
 > >
 > > Agreed (as suggested by Albert)
 > >
 > > >- The semantic nomenclature:
 > > >   - Bitmaps are arrays of bits not a BitMap object.
 > > >   - Indexes are ints and not an instance of a Collection
 object.
 > > >   - Cells are pairs of ints representing an index and a
 value.  They
 > > >   are not Pair<> objects.
 > > >   - Producers iterate over collections of the object (Bitmap,
 Index,
 > > >   Cell) applying a predicate to do work and stop the
 iteration early
 > > if
 > > >   necessary.  They are 

Re: [Collections] Suppliers, Iterables, and Producers

2024-05-03 Thread Gary Gregory
LGTM. Maybe the current PR (LGTM) should be merged first, Alex, how does
that PR look to you?

Gary

On Fri, May 3, 2024, 11:44 AM Claude Warren  wrote:

> Gary and Alex,
>
> Any thoughts on this?
>
> Claude
>
> On Wed, May 1, 2024 at 7:55 AM Claude Warren  wrote:
>
>> Good suggestions.
>>
>> short-circuit. We could make this distinction by including it in the name:
>>> forEachUntil(Predicate ...), forEachUnless, ...
>>
>>
>> We need the unit name in the method name.  All Bloom filters implement
>> IndexProducer and BitmapProducer and since they use Predicate method
>> parameters they will conflict.
>>
>>
>> I have opened a ticket [1] with the list of tasks, which I think is now:
>>
>>- Be clear that producers are like interruptible iterators with
>>predicate tests acting as a switch to short-circuit the iteration.
>>- Rename classes:
>>   - CellConsumer to CellPredicate (?)
>>   - Rename BitMap to BitMaps.
>>- Rename methods:
>>   - Producer forEachX() to forEachUntil()
>>   - The semantic nomenclature:
>>   - Bitmaps are arrays of bits not a BitMaps object.
>>   - Indexes are ints and not an instance of a Collection object.
>>   - Cells are pairs of ints representing an index and a value.  They
>>   are not Pair<> objects.
>>   - Producers iterate over collections of the object (Bitmap, Index,
>>   Cell) applying a predicate to do work and stop the iteration early if
>>   necessary.  They are carriers/transporters of Bloom filter enabled 
>> bits.
>>   They allow us to query the contents of the Bloom filter in an
>>   implementation agnostic way.
>>
>>
>> In thinking about the term Producer, other terms could be used
>> Interrogator (sounds like you can add a query), Extractor might work.  But
>> it has also come to mind that there is a "compute" series of methods in the
>> ConcurrentMap class.  Perhaps the term we want is not "forEach", but
>> "process".  The current form of usage is something like:
>>
>> IndexProducer ip = 
>> ip.forEachIndex(idx -> someIntPredicate)
>>
>> We could change the name from XProducer to XProcessor, or XExtractor; and
>> the method to processXs.  So the above code would look like:
>>
>> IndexExtractor ix = 
>> ix.processIndexs(idx -> someIntPredicate)
>>
>> another example
>>
>> BitMapExtractor bx = .
>> bx.processBitMaps(bitmap -> someBitMapPredicate)
>>
>> Claude
>>
>> [1] https://issues.apache.org/jira/browse/COLLECTIONS-854
>>
>>
>> On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory 
>> wrote:
>>
>>>
>>>
>>> On 2024/04/30 14:33:47 Alex Herbert wrote:
>>> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory 
>>> wrote:
>>> >
>>> > > Hi Claude,
>>> > >
>>> > > Thank you for the detailed reply :-) A few comments below.
>>> > >
>>> > > On 2024/04/30 06:29:38 Claude Warren wrote:
>>> > > > I will see if I can clarify the javadocs and make things clearer.
>>> > > >
>>> > > > What I think I specifically heard is:
>>> > > >
>>> > > >- Be clear that producers are fast fail iterators with predicate
>>> > > tests.
>>> > > >- Rename CellConsumer to CellPredicate (?)
>>> > >
>>> > > Agreed (as suggested by Albert)
>>> > >
>>> > > >- The semantic nomenclature:
>>> > > >   - Bitmaps are arrays of bits not a BitMap object.
>>> > > >   - Indexes are ints and not an instance of a Collection
>>> object.
>>> > > >   - Cells are pairs of ints representing an index and a
>>> value.  They
>>> > > >   are not Pair<> objects.
>>> > > >   - Producers iterate over collections of the object (Bitmap,
>>> Index,
>>> > > >   Cell) applying a predicate to do work and stop the iteration
>>> early
>>> > > if
>>> > > >   necessary.  They are carriers/transporters of Bloom filter
>>> enabled
>>> > > bits.
>>> > > >   They allow us to query the contents of the Bloom filter in an
>>> > > >   implementation agnostic way.
>>> > >
>>> > > As you say naming is hard. The above is a great example and a good
>>> > > exercise I've gone through at work and in other FOSS projects:
>>> "Producers
>>> > > iterate over collections of the object...". In general when I see or
>>> write
>>> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run",
>>> you
>>> > > get the idea ;-) I know that either the class (or method) name is
>>> bad or
>>> > > the Javadoc/documentation is bad; not _wrong_, just bad in the sense
>>> that
>>> > > it's confusing (to me).
>>> > >
>>> > > I am not advocating for a specific change ATM but I want to discuss
>>> the
>>> > > option because it is possible the current name is not as good as it
>>> could
>>> > > be. It could end up as an acceptable compromise if we cannot use
>>> more Java
>>> > > friendly terms though.
>>> > >
>>> > > Whenever I see a class that implements a "forEach"-kind of method, I
>>> think
>>> > > "Iterable".
>>> > >
>>> >
>>> > Here we should think "Collection", or generally more than 1. In the
>>> Java
>>> > sense an Iterable is something 

Re: [Collections] Suppliers, Iterables, and Producers

2024-05-03 Thread Claude Warren
Gary and Alex,

Any thoughts on this?

Claude

On Wed, May 1, 2024 at 7:55 AM Claude Warren  wrote:

> Good suggestions.
>
> short-circuit. We could make this distinction by including it in the name:
>> forEachUntil(Predicate ...), forEachUnless, ...
>
>
> We need the unit name in the method name.  All Bloom filters implement
> IndexProducer and BitmapProducer and since they use Predicate method
> parameters they will conflict.
>
>
> I have opened a ticket [1] with the list of tasks, which I think is now:
>
>- Be clear that producers are like interruptible iterators with
>predicate tests acting as a switch to short-circuit the iteration.
>- Rename classes:
>   - CellConsumer to CellPredicate (?)
>   - Rename BitMap to BitMaps.
>- Rename methods:
>   - Producer forEachX() to forEachUntil()
>   - The semantic nomenclature:
>   - Bitmaps are arrays of bits not a BitMaps object.
>   - Indexes are ints and not an instance of a Collection object.
>   - Cells are pairs of ints representing an index and a value.  They
>   are not Pair<> objects.
>   - Producers iterate over collections of the object (Bitmap, Index,
>   Cell) applying a predicate to do work and stop the iteration early if
>   necessary.  They are carriers/transporters of Bloom filter enabled bits.
>   They allow us to query the contents of the Bloom filter in an
>   implementation agnostic way.
>
>
> In thinking about the term Producer, other terms could be used
> Interrogator (sounds like you can add a query), Extractor might work.  But
> it has also come to mind that there is a "compute" series of methods in the
> ConcurrentMap class.  Perhaps the term we want is not "forEach", but
> "process".  The current form of usage is something like:
>
> IndexProducer ip = 
> ip.forEachIndex(idx -> someIntPredicate)
>
> We could change the name from XProducer to XProcessor, or XExtractor; and
> the method to processXs.  So the above code would look like:
>
> IndexExtractor ix = 
> ix.processIndexs(idx -> someIntPredicate)
>
> another example
>
> BitMapExtractor bx = .
> bx.processBitMaps(bitmap -> someBitMapPredicate)
>
> Claude
>
> [1] https://issues.apache.org/jira/browse/COLLECTIONS-854
>
>
> On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory 
> wrote:
>
>>
>>
>> On 2024/04/30 14:33:47 Alex Herbert wrote:
>> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory 
>> wrote:
>> >
>> > > Hi Claude,
>> > >
>> > > Thank you for the detailed reply :-) A few comments below.
>> > >
>> > > On 2024/04/30 06:29:38 Claude Warren wrote:
>> > > > I will see if I can clarify the javadocs and make things clearer.
>> > > >
>> > > > What I think I specifically heard is:
>> > > >
>> > > >- Be clear that producers are fast fail iterators with predicate
>> > > tests.
>> > > >- Rename CellConsumer to CellPredicate (?)
>> > >
>> > > Agreed (as suggested by Albert)
>> > >
>> > > >- The semantic nomenclature:
>> > > >   - Bitmaps are arrays of bits not a BitMap object.
>> > > >   - Indexes are ints and not an instance of a Collection object.
>> > > >   - Cells are pairs of ints representing an index and a value.
>> They
>> > > >   are not Pair<> objects.
>> > > >   - Producers iterate over collections of the object (Bitmap,
>> Index,
>> > > >   Cell) applying a predicate to do work and stop the iteration
>> early
>> > > if
>> > > >   necessary.  They are carriers/transporters of Bloom filter
>> enabled
>> > > bits.
>> > > >   They allow us to query the contents of the Bloom filter in an
>> > > >   implementation agnostic way.
>> > >
>> > > As you say naming is hard. The above is a great example and a good
>> > > exercise I've gone through at work and in other FOSS projects:
>> "Producers
>> > > iterate over collections of the object...". In general when I see or
>> write
>> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run",
>> you
>> > > get the idea ;-) I know that either the class (or method) name is bad
>> or
>> > > the Javadoc/documentation is bad; not _wrong_, just bad in the sense
>> that
>> > > it's confusing (to me).
>> > >
>> > > I am not advocating for a specific change ATM but I want to discuss
>> the
>> > > option because it is possible the current name is not as good as it
>> could
>> > > be. It could end up as an acceptable compromise if we cannot use more
>> Java
>> > > friendly terms though.
>> > >
>> > > Whenever I see a class that implements a "forEach"-kind of method, I
>> think
>> > > "Iterable".
>> > >
>> >
>> > Here we should think "Collection", or generally more than 1. In the Java
>> > sense an Iterable is something you can walk through to the
>> > end, possibly removing elements as you go using the Iterator interface.
>> We
>> > would not require supporting removal, and we want to control a
>> > short-circuit. We could make this distinction by including it in the
>> name:
>> > forEachUntil(Predicate ...), 

Re: [Collections] Suppliers, Iterables, and Producers

2024-04-30 Thread Claude Warren
Good suggestions.

short-circuit. We could make this distinction by including it in the name:
> forEachUntil(Predicate ...), forEachUnless, ...


We need the unit name in the method name.  All Bloom filters implement
IndexProducer and BitmapProducer and since they use Predicate method
parameters they will conflict.


I have opened a ticket [1] with the list of tasks, which I think is now:

   - Be clear that producers are like interruptible iterators with
   predicate tests acting as a switch to short-circuit the iteration.
   - Rename classes:
  - CellConsumer to CellPredicate (?)
  - Rename BitMap to BitMaps.
   - Rename methods:
  - Producer forEachX() to forEachUntil()
  - The semantic nomenclature:
  - Bitmaps are arrays of bits not a BitMaps object.
  - Indexes are ints and not an instance of a Collection object.
  - Cells are pairs of ints representing an index and a value.  They
  are not Pair<> objects.
  - Producers iterate over collections of the object (Bitmap, Index,
  Cell) applying a predicate to do work and stop the iteration early if
  necessary.  They are carriers/transporters of Bloom filter enabled bits.
  They allow us to query the contents of the Bloom filter in an
  implementation agnostic way.


In thinking about the term Producer, other terms could be used Interrogator
(sounds like you can add a query), Extractor might work.  But it has also
come to mind that there is a "compute" series of methods in the
ConcurrentMap class.  Perhaps the term we want is not "forEach", but
"process".  The current form of usage is something like:

IndexProducer ip = 
ip.forEachIndex(idx -> someIntPredicate)

We could change the name from XProducer to XProcessor, or XExtractor; and
the method to processXs.  So the above code would look like:

IndexExtractor ix = 
ix.processIndexs(idx -> someIntPredicate)

another example

BitMapExtractor bx = .
bx.processBitMaps(bitmap -> someBitMapPredicate)

Claude

[1] https://issues.apache.org/jira/browse/COLLECTIONS-854


On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory  wrote:

>
>
> On 2024/04/30 14:33:47 Alex Herbert wrote:
> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory 
> wrote:
> >
> > > Hi Claude,
> > >
> > > Thank you for the detailed reply :-) A few comments below.
> > >
> > > On 2024/04/30 06:29:38 Claude Warren wrote:
> > > > I will see if I can clarify the javadocs and make things clearer.
> > > >
> > > > What I think I specifically heard is:
> > > >
> > > >- Be clear that producers are fast fail iterators with predicate
> > > tests.
> > > >- Rename CellConsumer to CellPredicate (?)
> > >
> > > Agreed (as suggested by Albert)
> > >
> > > >- The semantic nomenclature:
> > > >   - Bitmaps are arrays of bits not a BitMap object.
> > > >   - Indexes are ints and not an instance of a Collection object.
> > > >   - Cells are pairs of ints representing an index and a value.
> They
> > > >   are not Pair<> objects.
> > > >   - Producers iterate over collections of the object (Bitmap,
> Index,
> > > >   Cell) applying a predicate to do work and stop the iteration
> early
> > > if
> > > >   necessary.  They are carriers/transporters of Bloom filter
> enabled
> > > bits.
> > > >   They allow us to query the contents of the Bloom filter in an
> > > >   implementation agnostic way.
> > >
> > > As you say naming is hard. The above is a great example and a good
> > > exercise I've gone through at work and in other FOSS projects:
> "Producers
> > > iterate over collections of the object...". In general when I see or
> write
> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run",
> you
> > > get the idea ;-) I know that either the class (or method) name is bad
> or
> > > the Javadoc/documentation is bad; not _wrong_, just bad in the sense
> that
> > > it's confusing (to me).
> > >
> > > I am not advocating for a specific change ATM but I want to discuss the
> > > option because it is possible the current name is not as good as it
> could
> > > be. It could end up as an acceptable compromise if we cannot use more
> Java
> > > friendly terms though.
> > >
> > > Whenever I see a class that implements a "forEach"-kind of method, I
> think
> > > "Iterable".
> > >
> >
> > Here we should think "Collection", or generally more than 1. In the Java
> > sense an Iterable is something you can walk through to the
> > end, possibly removing elements as you go using the Iterator interface.
> We
> > would not require supporting removal, and we want to control a
> > short-circuit. We could make this distinction by including it in the
> name:
> > forEachUntil(Predicate ...), forEachUnless, ...
>
> I really like the idea of have the method name reflect the short-circuit
> aspect of the operation!
>
> We do not have to invent a new method name though, Stream uses
> "takeWhile(Predicate)" in Java 9:
> 

Re: [Collections] Suppliers, Iterables, and Producers

2024-04-30 Thread Gary D. Gregory



On 2024/04/30 14:33:47 Alex Herbert wrote:
> On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory  wrote:
> 
> > Hi Claude,
> >
> > Thank you for the detailed reply :-) A few comments below.
> >
> > On 2024/04/30 06:29:38 Claude Warren wrote:
> > > I will see if I can clarify the javadocs and make things clearer.
> > >
> > > What I think I specifically heard is:
> > >
> > >- Be clear that producers are fast fail iterators with predicate
> > tests.
> > >- Rename CellConsumer to CellPredicate (?)
> >
> > Agreed (as suggested by Albert)
> >
> > >- The semantic nomenclature:
> > >   - Bitmaps are arrays of bits not a BitMap object.
> > >   - Indexes are ints and not an instance of a Collection object.
> > >   - Cells are pairs of ints representing an index and a value.  They
> > >   are not Pair<> objects.
> > >   - Producers iterate over collections of the object (Bitmap, Index,
> > >   Cell) applying a predicate to do work and stop the iteration early
> > if
> > >   necessary.  They are carriers/transporters of Bloom filter enabled
> > bits.
> > >   They allow us to query the contents of the Bloom filter in an
> > >   implementation agnostic way.
> >
> > As you say naming is hard. The above is a great example and a good
> > exercise I've gone through at work and in other FOSS projects: "Producers
> > iterate over collections of the object...". In general when I see or write
> > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run", you
> > get the idea ;-) I know that either the class (or method) name is bad or
> > the Javadoc/documentation is bad; not _wrong_, just bad in the sense that
> > it's confusing (to me).
> >
> > I am not advocating for a specific change ATM but I want to discuss the
> > option because it is possible the current name is not as good as it could
> > be. It could end up as an acceptable compromise if we cannot use more Java
> > friendly terms though.
> >
> > Whenever I see a class that implements a "forEach"-kind of method, I think
> > "Iterable".
> >
> 
> Here we should think "Collection", or generally more than 1. In the Java
> sense an Iterable is something you can walk through to the
> end, possibly removing elements as you go using the Iterator interface. We
> would not require supporting removal, and we want to control a
> short-circuit. We could make this distinction by including it in the name:
> forEachUntil(Predicate ...), forEachUnless, ...

I really like the idea of have the method name reflect the short-circuit aspect 
of the operation!

We do not have to invent a new method name though, Stream uses 
"takeWhile(Predicate)" in Java 9: 
https://docs.oracle.com/javase/9/docs/api/java/util/stream/Stream.html#takeWhile-java.util.function.Predicate-

The question becomes whether this name make the class too close to a Stream and 
introduces some kind of confusion. If that's the case, then forEachUntil() is 
good.

> 
> 
> >
> > Note the difference with "Iterator", and I had to lookup the difference
> > since the former implements "forEach" and the  later "forEachRemaining"!
> > "Iterable" is also a factory of "Iterator"s.
> >
> > Should the Producers ever be implementations of Iterable or Iterator?
> > Right now, the answer is no because of the short-circuit aspect of using a
> > predicate. I'm not using the term fail-fast here because I don't think of
> > the iteration being in error (please tell me if I'm wrong).
> >
> > If not iterable, then we should not use that name as part of the class
> > name. Generally, the short-circuit aspect of Producers do not make a bad
> > candidates for implementations of Iterable since it can throw (unchecked)
> > exceptions. Different for call sites granted, but I'm just mentioning it
> > for fun.
> >
> 
> I already mentioned throwing runtime exceptions for the short-circuit
> functionality, and that it was ruled out on the basis of performance (given
> a lot of short-circuiting is expected) and convenience for the caller. I
> don't think we should go there. Design the API for the intended purpose,
> and not push it into a box that is easily recognisable.

Sounds good.

> 
> 
> >
> > So maybe there's nothing to do. I just want to be clear about it. For
> > example, I think of "factory" and "producer" as synonyms but in this case,
> > this is not a traditional application of the factory pattern.
> >
> > As an aside I can see that Producers would not be Streams out of the box
> > because Stream#filter(Predicate) filters but does not short-circuit
> > iteration. While Stream#findAny() and #findFirst() don't fit the
> > short-circuit bill, we could implement a #findLast() in a Stream
> > implementation. What I do not know is if Streams otherwise fit the bill of
> > Bloom filters.
> >
> 
> In the case of small loops with few instructions per loop, the overhead of
> Streams is significant. Unfortunately we don't have any performance tests
> for this library, but I wouldn't change to Streams 

Re: [Collections] Suppliers, Iterables, and Producers

2024-04-30 Thread Alex Herbert
On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory  wrote:

> Hi Claude,
>
> Thank you for the detailed reply :-) A few comments below.
>
> On 2024/04/30 06:29:38 Claude Warren wrote:
> > I will see if I can clarify the javadocs and make things clearer.
> >
> > What I think I specifically heard is:
> >
> >- Be clear that producers are fast fail iterators with predicate
> tests.
> >- Rename CellConsumer to CellPredicate (?)
>
> Agreed (as suggested by Albert)
>
> >- The semantic nomenclature:
> >   - Bitmaps are arrays of bits not a BitMap object.
> >   - Indexes are ints and not an instance of a Collection object.
> >   - Cells are pairs of ints representing an index and a value.  They
> >   are not Pair<> objects.
> >   - Producers iterate over collections of the object (Bitmap, Index,
> >   Cell) applying a predicate to do work and stop the iteration early
> if
> >   necessary.  They are carriers/transporters of Bloom filter enabled
> bits.
> >   They allow us to query the contents of the Bloom filter in an
> >   implementation agnostic way.
>
> As you say naming is hard. The above is a great example and a good
> exercise I've gone through at work and in other FOSS projects: "Producers
> iterate over collections of the object...". In general when I see or write
> a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run", you
> get the idea ;-) I know that either the class (or method) name is bad or
> the Javadoc/documentation is bad; not _wrong_, just bad in the sense that
> it's confusing (to me).
>
> I am not advocating for a specific change ATM but I want to discuss the
> option because it is possible the current name is not as good as it could
> be. It could end up as an acceptable compromise if we cannot use more Java
> friendly terms though.
>
> Whenever I see a class that implements a "forEach"-kind of method, I think
> "Iterable".
>

Here we should think "Collection", or generally more than 1. In the Java
sense an Iterable is something you can walk through to the
end, possibly removing elements as you go using the Iterator interface. We
would not require supporting removal, and we want to control a
short-circuit. We could make this distinction by including it in the name:
forEachUntil(Predicate ...), forEachUnless, ...


>
> Note the difference with "Iterator", and I had to lookup the difference
> since the former implements "forEach" and the  later "forEachRemaining"!
> "Iterable" is also a factory of "Iterator"s.
>
> Should the Producers ever be implementations of Iterable or Iterator?
> Right now, the answer is no because of the short-circuit aspect of using a
> predicate. I'm not using the term fail-fast here because I don't think of
> the iteration being in error (please tell me if I'm wrong).
>
> If not iterable, then we should not use that name as part of the class
> name. Generally, the short-circuit aspect of Producers do not make a bad
> candidates for implementations of Iterable since it can throw (unchecked)
> exceptions. Different for call sites granted, but I'm just mentioning it
> for fun.
>

I already mentioned throwing runtime exceptions for the short-circuit
functionality, and that it was ruled out on the basis of performance (given
a lot of short-circuiting is expected) and convenience for the caller. I
don't think we should go there. Design the API for the intended purpose,
and not push it into a box that is easily recognisable.


>
> So maybe there's nothing to do. I just want to be clear about it. For
> example, I think of "factory" and "producer" as synonyms but in this case,
> this is not a traditional application of the factory pattern.
>
> As an aside I can see that Producers would not be Streams out of the box
> because Stream#filter(Predicate) filters but does not short-circuit
> iteration. While Stream#findAny() and #findFirst() don't fit the
> short-circuit bill, we could implement a #findLast() in a Stream
> implementation. What I do not know is if Streams otherwise fit the bill of
> Bloom filters.
>

In the case of small loops with few instructions per loop, the overhead of
Streams is significant. Unfortunately we don't have any performance tests
for this library, but I wouldn't change to Streams without knowing it does
not impact performance. Performance is a key feature of Bloom filters.
Otherwise you can achieve some of their functionality with conventional
collections.


>
> >
> > Does that basically cover the confusion?   If there are better terms,
> let's
> > hash them out now before I update the javadocs.
> >
> > As an aside, Cells and Bitmaps are referenced in the literature.  For the
> > most part the rest is made up out of whole cloth.  So we could change
> > "Producer" to something else but we would need a good name.
>
> We have a class called BitMap and methods that use "BitMap" in the same
> but I think I am more comfortable with the term reuse now.
>
> The question that remains is must it be 

Re: [Collections] Suppliers, Iterables, and Producers

2024-04-30 Thread Gary D. Gregory
Hi Claude,

Thank you for the detailed reply :-) A few comments below.

On 2024/04/30 06:29:38 Claude Warren wrote:
> I will see if I can clarify the javadocs and make things clearer.
> 
> What I think I specifically heard is:
> 
>- Be clear that producers are fast fail iterators with predicate tests.
>- Rename CellConsumer to CellPredicate (?)

Agreed (as suggested by Albert)

>- The semantic nomenclature:
>   - Bitmaps are arrays of bits not a BitMap object.
>   - Indexes are ints and not an instance of a Collection object.
>   - Cells are pairs of ints representing an index and a value.  They
>   are not Pair<> objects.
>   - Producers iterate over collections of the object (Bitmap, Index,
>   Cell) applying a predicate to do work and stop the iteration early if
>   necessary.  They are carriers/transporters of Bloom filter enabled bits.
>   They allow us to query the contents of the Bloom filter in an
>   implementation agnostic way.

As you say naming is hard. The above is a great example and a good exercise 
I've gone through at work and in other FOSS projects: "Producers iterate over 
collections of the object...". In general when I see or write a Javadoc of the 
form "Foo bars" or "Runners walk" or "Walkers run", you get the idea ;-) I know 
that either the class (or method) name is bad or the Javadoc/documentation is 
bad; not _wrong_, just bad in the sense that it's confusing (to me). 

I am not advocating for a specific change ATM but I want to discuss the option 
because it is possible the current name is not as good as it could be. It could 
end up as an acceptable compromise if we cannot use more Java friendly terms 
though.

Whenever I see a class that implements a "forEach"-kind of method, I think 
"Iterable".

Note the difference with "Iterator", and I had to lookup the difference since 
the former implements "forEach" and the  later "forEachRemaining"! "Iterable" 
is also a factory of "Iterator"s.

Should the Producers ever be implementations of Iterable or Iterator? Right 
now, the answer is no because of the short-circuit aspect of using a predicate. 
I'm not using the term fail-fast here because I don't think of the iteration 
being in error (please tell me if I'm wrong). 

If not iterable, then we should not use that name as part of the class name. 
Generally, the short-circuit aspect of Producers do not make a bad candidates 
for implementations of Iterable since it can throw (unchecked) exceptions. 
Different for call sites granted, but I'm just mentioning it for fun.

So maybe there's nothing to do. I just want to be clear about it. For example, 
I think of "factory" and "producer" as synonyms but in this case, this is not a 
traditional application of the factory pattern.

As an aside I can see that Producers would not be Streams out of the box 
because Stream#filter(Predicate) filters but does not short-circuit iteration. 
While Stream#findAny() and #findFirst() don't fit the short-circuit bill, we 
could implement a #findLast() in a Stream implementation. What I do not know is 
if Streams otherwise fit the bill of Bloom filters.

> 
> Does that basically cover the confusion?   If there are better terms, let's
> hash them out now before I update the javadocs.
> 
> As an aside, Cells and Bitmaps are referenced in the literature.  For the
> most part the rest is made up out of whole cloth.  So we could change
> "Producer" to something else but we would need a good name.

We have a class called BitMap and methods that use "BitMap" in the same but I 
think I am more comfortable with the term reuse now.

The question that remains is must it be public? Since the Javadoc mentions it 
is about indices and bit positions, could all these methods be moved to the 
package-private IndexUtils? My concern is to reduce the public and protected 
API surface we will have to support and keep for the lifetime of the 4.x code 
base.

> 
> Semantically:
> 
>- As Hasher generates an IndexProducer once it knows what the range of
>the values are and how many values it should produce (as defined in the
>Shape).  That index producer can be used multiple times and will produce
>the same set of values in the same order.
>- A Bloom filter generates an IndexProducer that enumerates the enabled
>bits in the filter.
>- A CellProducer generates an IndexProducer that reports all the index
>values that it contains.
> 
> In implementing stable Bloom filters I had to create a RandomHasher that
> generates an IndexProducer that will generate values in the range and
> number specified by the Shape but that does not produce the same values
> every time (obviously).
> 
> We could change Producer to a term that means a representation: Ideogram,
> but then we have to introduce the term and explain what it means.  Producer
> starts at a common point.
> 
> All of this just goes to show that "Naming things is hard".  But then we
> all knew that 

Re: [Collections] Suppliers, Iterables, and Producers

2024-04-30 Thread Claude Warren
I will see if I can clarify the javadocs and make things clearer.

What I think I specifically heard is:

   - Be clear that producers are fast fail iterators with predicate tests.
   - Rename CellConsumer to CellPredicate (?)
   - The semantic nomenclature:
  - Bitmaps are arrays of bits not a BitMap object.
  - Indexes are ints and not an instance of a Collection object.
  - Cells are pairs of ints representing an index and a value.  They
  are not Pair<> objects.
  - Producers iterate over collections of the object (Bitmap, Index,
  Cell) applying a predicate to do work and stop the iteration early if
  necessary.  They are carriers/transporters of Bloom filter enabled bits.
  They allow us to query the contents of the Bloom filter in an
  implementation agnostic way.

Does that basically cover the confusion?   If there are better terms, let's
hash them out now before I update the javadocs.

As an aside, Cells and Bitmaps are referenced in the literature.  For the
most part the rest is made up out of whole cloth.  So we could change
"Producer" to something else but we would need a good name.

Semantically:

   - As Hasher generates an IndexProducer once it knows what the range of
   the values are and how many values it should produce (as defined in the
   Shape).  That index producer can be used multiple times and will produce
   the same set of values in the same order.
   - A Bloom filter generates an IndexProducer that enumerates the enabled
   bits in the filter.
   - A CellProducer generates an IndexProducer that reports all the index
   values that it contains.

In implementing stable Bloom filters I had to create a RandomHasher that
generates an IndexProducer that will generate values in the range and
number specified by the Shape but that does not produce the same values
every time (obviously).

We could change Producer to a term that means a representation: Ideogram,
but then we have to introduce the term and explain what it means.  Producer
starts at a common point.

All of this just goes to show that "Naming things is hard".  But then we
all knew that anyway.
Claude




On Sun, Apr 28, 2024 at 11:00 PM Gary Gregory 
wrote:

> Thank you for your thoughtful reply. See my comments below.
>
> On Sun, Apr 28, 2024 at 11:10 AM Alex Herbert 
> wrote:
> >
> > Hi Gary,
> >
> > I am in favour of using nomenclature and patterns that will be familiar
> to
> > a Java developer. But only if they match the familiar JDK use patterns.
> The
> > Bloom filter package has some atypical use patterns that have driven the
> > current API to where it is. I'll try and describe these below.
> >
> > On Sun, 28 Apr 2024 at 14:16, Gary Gregory 
> wrote:
> >
> > > Hi Clause, Albert, and all,
> > >
> > > Since the introduction of lambdas in Java 8, Java has a well-defined
> > > terminology around the classic producer-consumer paradigm but (for
> > > reasons unknown to me) realized in the functional interfaces *Supplier
> > > and *Consumer. In addition, as of Java 5, we have the Iterable
> > > interface.
> > >
> > > In our new Bloom filter package we have new interfaces called
> > > *Producer (as opposed to *Supplier), where some of these new
> > > interfaces are formally annotated with @FunctionalInterface and some
> > > not (for example, BloomFilterProducer).
> > >
> > > My question is: Why call these "Producers" instead of "Suppliers"? Is
> > > the formal Bloom filter literature tied to the "Producer" terminology
> > > in a way that would make adapting to the Java term confusing? I know I
> > > brought up a similar topic recently, but I would like to revisit it
> > > now that I've started to read Claude's blog drafts. Even without
> > > making the current "Producers" formal suppliers by extending Supplier,
> > > would it be worth using the Java terminology?
> > >
> >
> > Claude is familiar with the literature and can comment on that. I would
> > defer to the literature if it is a common term.
> >
> > There is one notable distinction to JDK suppliers. Suppliers only supply
> 1
> > element and must be repeatedly called to generate more. The Producers in
> > the BloomFilter package will supply multiple values. They are invoked
> using
> > a forEach pattern with the intention of supplying all the elements to a
> > predicate, not a consumer. If any of those elements is rejected by the
> > predicate then the rest of the elements are not supplied. So this is a
> > fail-fast bulk supplier.
>
> Ah, this sounds like a special Iterator, fail-fast as you mention, and
> Java does not have that in Java 8 at least.
> The Producer class suffix still confuses me since this is neither a
> factory nor a traditional supplier. If the classes were called
> *Iterator and not extend iterator, then it would also be confusing.
> The question is whether it would be useful to extend Iterator or if
> the class would never be used as a traditional Iterator. I'll that to
> an SME ;-)
>
> >
> >
> > >
> > > My 

Re: [Collections] Suppliers, Iterables, and Producers

2024-04-28 Thread Gary Gregory
Thank you for your thoughtful reply. See my comments below.

On Sun, Apr 28, 2024 at 11:10 AM Alex Herbert  wrote:
>
> Hi Gary,
>
> I am in favour of using nomenclature and patterns that will be familiar to
> a Java developer. But only if they match the familiar JDK use patterns. The
> Bloom filter package has some atypical use patterns that have driven the
> current API to where it is. I'll try and describe these below.
>
> On Sun, 28 Apr 2024 at 14:16, Gary Gregory  wrote:
>
> > Hi Clause, Albert, and all,
> >
> > Since the introduction of lambdas in Java 8, Java has a well-defined
> > terminology around the classic producer-consumer paradigm but (for
> > reasons unknown to me) realized in the functional interfaces *Supplier
> > and *Consumer. In addition, as of Java 5, we have the Iterable
> > interface.
> >
> > In our new Bloom filter package we have new interfaces called
> > *Producer (as opposed to *Supplier), where some of these new
> > interfaces are formally annotated with @FunctionalInterface and some
> > not (for example, BloomFilterProducer).
> >
> > My question is: Why call these "Producers" instead of "Suppliers"? Is
> > the formal Bloom filter literature tied to the "Producer" terminology
> > in a way that would make adapting to the Java term confusing? I know I
> > brought up a similar topic recently, but I would like to revisit it
> > now that I've started to read Claude's blog drafts. Even without
> > making the current "Producers" formal suppliers by extending Supplier,
> > would it be worth using the Java terminology?
> >
>
> Claude is familiar with the literature and can comment on that. I would
> defer to the literature if it is a common term.
>
> There is one notable distinction to JDK suppliers. Suppliers only supply 1
> element and must be repeatedly called to generate more. The Producers in
> the BloomFilter package will supply multiple values. They are invoked using
> a forEach pattern with the intention of supplying all the elements to a
> predicate, not a consumer. If any of those elements is rejected by the
> predicate then the rest of the elements are not supplied. So this is a
> fail-fast bulk supplier.

Ah, this sounds like a special Iterator, fail-fast as you mention, and
Java does not have that in Java 8 at least.
The Producer class suffix still confuses me since this is neither a
factory nor a traditional supplier. If the classes were called
*Iterator and not extend iterator, then it would also be confusing.
The question is whether it would be useful to extend Iterator or if
the class would never be used as a traditional Iterator. I'll that to
an SME ;-)

>
>
> >
> > My second observation is that some might neither be "Producers" or
> > "Suppliers" but instead be extensions of Iterable. For example,
> > BitMapProducer is not a factory for instances of BitMap; the BitMap
> > does not appear in the signatures of BitMapProducer methods. From a
> > strict Java POV, this is (slightly) perplexing.
> >
>
> Iterable was suggested in an earlier API, particular for the IndexProducer.
> IIRC it was rejected on the basis of simplifying the code for the caller in
> the fail-fast case. Otherwise every user of the iterator must implement
> fail-fast loops over the elements. There may have been other reasons so it
> could be worth a check in the mailing list archives. It would require going
> back a few years but it was discussed on the dev list.
>
> The term BitMap refers to a long that holds 64-consecutive indices as
> either present or absent. You can consider the sequential bitmaps
> containing all indices from [0, n) as the serialized state of a Bloom
> filter with n bits. This is essentially a BitSet as you can see from the
> SimpleBloomFilter implementation. This originally wrapped a BitSet; it was
> converted to directly implement the required read/write bit functionality
> on the grounds of performance (no memory reallocation; no index checks).
>
> We do not have a BitMap class since we use a long primitive.

Yes, we do; it's right here: org.apache.commons.collections4.bloomfilter.BitMap

This makes it hard for a non-expert to groke IMO. If we use terms in
class names and discussions that are... what? Mismatched or misnamed.

A rename would
> be to LongProducer causing a name clash with the JDK. Renaming to something
> else is possible but I believe BitMap is a term from the literature.
>
>
> >
> > Instead (forgetting the class name issue for now), we could have:
> >
> > @FunctionalInterface
> > public interface BitMapProducer extends Iterable {...}
> >
> > Which would let implementations define:
> >
> > Iterator iterator();
> >
> > Instead of:
> >
> > boolean forEachBitMap(LongPredicate predicate);
> >
>
> The BitMapProducer is not iterating LongPredicates. It is iterating longs
> to be accepted by a single LongPredicate. The boolean return allows
> signalling to stop the forEach loop. There is no primitive specialisation
> of Iterator for long. There is a Spliterator.OfLong 

Re: [Collections] Suppliers, Iterables, and Producers

2024-04-28 Thread Brett Okken
> There is no primitive specialisation
of Iterator for long

https://docs.oracle.com/javase/8/docs/api/java/util/PrimitiveIterator.OfLong.html



On Sun, Apr 28, 2024 at 10:00 AM Alex Herbert 
wrote:

> Hi Gary,
>
> I am in favour of using nomenclature and patterns that will be familiar to
> a Java developer. But only if they match the familiar JDK use patterns. The
> Bloom filter package has some atypical use patterns that have driven the
> current API to where it is. I'll try and describe these below.
>
> On Sun, 28 Apr 2024 at 14:16, Gary Gregory  wrote:
>
> > Hi Clause, Albert, and all,
> >
> > Since the introduction of lambdas in Java 8, Java has a well-defined
> > terminology around the classic producer-consumer paradigm but (for
> > reasons unknown to me) realized in the functional interfaces *Supplier
> > and *Consumer. In addition, as of Java 5, we have the Iterable
> > interface.
> >
> > In our new Bloom filter package we have new interfaces called
> > *Producer (as opposed to *Supplier), where some of these new
> > interfaces are formally annotated with @FunctionalInterface and some
> > not (for example, BloomFilterProducer).
> >
> > My question is: Why call these "Producers" instead of "Suppliers"? Is
> > the formal Bloom filter literature tied to the "Producer" terminology
> > in a way that would make adapting to the Java term confusing? I know I
> > brought up a similar topic recently, but I would like to revisit it
> > now that I've started to read Claude's blog drafts. Even without
> > making the current "Producers" formal suppliers by extending Supplier,
> > would it be worth using the Java terminology?
> >
>
> Claude is familiar with the literature and can comment on that. I would
> defer to the literature if it is a common term.
>
> There is one notable distinction to JDK suppliers. Suppliers only supply 1
> element and must be repeatedly called to generate more. The Producers in
> the BloomFilter package will supply multiple values. They are invoked using
> a forEach pattern with the intention of supplying all the elements to a
> predicate, not a consumer. If any of those elements is rejected by the
> predicate then the rest of the elements are not supplied. So this is a
> fail-fast bulk supplier.
>
>
> >
> > My second observation is that some might neither be "Producers" or
> > "Suppliers" but instead be extensions of Iterable. For example,
> > BitMapProducer is not a factory for instances of BitMap; the BitMap
> > does not appear in the signatures of BitMapProducer methods. From a
> > strict Java POV, this is (slightly) perplexing.
> >
>
> Iterable was suggested in an earlier API, particular for the IndexProducer.
> IIRC it was rejected on the basis of simplifying the code for the caller in
> the fail-fast case. Otherwise every user of the iterator must implement
> fail-fast loops over the elements. There may have been other reasons so it
> could be worth a check in the mailing list archives. It would require going
> back a few years but it was discussed on the dev list.
>
> The term BitMap refers to a long that holds 64-consecutive indices as
> either present or absent. You can consider the sequential bitmaps
> containing all indices from [0, n) as the serialized state of a Bloom
> filter with n bits. This is essentially a BitSet as you can see from the
> SimpleBloomFilter implementation. This originally wrapped a BitSet; it was
> converted to directly implement the required read/write bit functionality
> on the grounds of performance (no memory reallocation; no index checks).
>
> We do not have a BitMap class since we use a long primitive. A rename would
> be to LongProducer causing a name clash with the JDK. Renaming to something
> else is possible but I believe BitMap is a term from the literature.
>
>
> >
> > Instead (forgetting the class name issue for now), we could have:
> >
> > @FunctionalInterface
> > public interface BitMapProducer extends Iterable {...}
> >
> > Which would let implementations define:
> >
> > Iterator iterator();
> >
> > Instead of:
> >
> > boolean forEachBitMap(LongPredicate predicate);
> >
>
> The BitMapProducer is not iterating LongPredicates. It is iterating longs
> to be accepted by a single LongPredicate. The boolean return allows
> signalling to stop the forEach loop. There is no primitive specialisation
> of Iterator for long. There is a Spliterator.OfLong but that bundles some
> other API that we do not wish to support, namely parallel streaming via
> split and the ability to advance element by element (tryAdvance). Currently
> we only implement the equivalent of the forEachRemaining pattern from
> Spliterator. That accepts a consumer and so fail-fast would be done via
> raising a runtime exception. Given that fail-fast is a key feature of a
> Bloom filter then we do not want this to be implemented via exceptions.
>
> The primary use case for fail-fast is to stop as soon as a bit index is
> found, or not found (case dependent). Consider a Bloom 

Re: [Collections] Suppliers, Iterables, and Producers

2024-04-28 Thread Alex Herbert
Hi Gary,

I am in favour of using nomenclature and patterns that will be familiar to
a Java developer. But only if they match the familiar JDK use patterns. The
Bloom filter package has some atypical use patterns that have driven the
current API to where it is. I'll try and describe these below.

On Sun, 28 Apr 2024 at 14:16, Gary Gregory  wrote:

> Hi Clause, Albert, and all,
>
> Since the introduction of lambdas in Java 8, Java has a well-defined
> terminology around the classic producer-consumer paradigm but (for
> reasons unknown to me) realized in the functional interfaces *Supplier
> and *Consumer. In addition, as of Java 5, we have the Iterable
> interface.
>
> In our new Bloom filter package we have new interfaces called
> *Producer (as opposed to *Supplier), where some of these new
> interfaces are formally annotated with @FunctionalInterface and some
> not (for example, BloomFilterProducer).
>
> My question is: Why call these "Producers" instead of "Suppliers"? Is
> the formal Bloom filter literature tied to the "Producer" terminology
> in a way that would make adapting to the Java term confusing? I know I
> brought up a similar topic recently, but I would like to revisit it
> now that I've started to read Claude's blog drafts. Even without
> making the current "Producers" formal suppliers by extending Supplier,
> would it be worth using the Java terminology?
>

Claude is familiar with the literature and can comment on that. I would
defer to the literature if it is a common term.

There is one notable distinction to JDK suppliers. Suppliers only supply 1
element and must be repeatedly called to generate more. The Producers in
the BloomFilter package will supply multiple values. They are invoked using
a forEach pattern with the intention of supplying all the elements to a
predicate, not a consumer. If any of those elements is rejected by the
predicate then the rest of the elements are not supplied. So this is a
fail-fast bulk supplier.


>
> My second observation is that some might neither be "Producers" or
> "Suppliers" but instead be extensions of Iterable. For example,
> BitMapProducer is not a factory for instances of BitMap; the BitMap
> does not appear in the signatures of BitMapProducer methods. From a
> strict Java POV, this is (slightly) perplexing.
>

Iterable was suggested in an earlier API, particular for the IndexProducer.
IIRC it was rejected on the basis of simplifying the code for the caller in
the fail-fast case. Otherwise every user of the iterator must implement
fail-fast loops over the elements. There may have been other reasons so it
could be worth a check in the mailing list archives. It would require going
back a few years but it was discussed on the dev list.

The term BitMap refers to a long that holds 64-consecutive indices as
either present or absent. You can consider the sequential bitmaps
containing all indices from [0, n) as the serialized state of a Bloom
filter with n bits. This is essentially a BitSet as you can see from the
SimpleBloomFilter implementation. This originally wrapped a BitSet; it was
converted to directly implement the required read/write bit functionality
on the grounds of performance (no memory reallocation; no index checks).

We do not have a BitMap class since we use a long primitive. A rename would
be to LongProducer causing a name clash with the JDK. Renaming to something
else is possible but I believe BitMap is a term from the literature.


>
> Instead (forgetting the class name issue for now), we could have:
>
> @FunctionalInterface
> public interface BitMapProducer extends Iterable {...}
>
> Which would let implementations define:
>
> Iterator iterator();
>
> Instead of:
>
> boolean forEachBitMap(LongPredicate predicate);
>

The BitMapProducer is not iterating LongPredicates. It is iterating longs
to be accepted by a single LongPredicate. The boolean return allows
signalling to stop the forEach loop. There is no primitive specialisation
of Iterator for long. There is a Spliterator.OfLong but that bundles some
other API that we do not wish to support, namely parallel streaming via
split and the ability to advance element by element (tryAdvance). Currently
we only implement the equivalent of the forEachRemaining pattern from
Spliterator. That accepts a consumer and so fail-fast would be done via
raising a runtime exception. Given that fail-fast is a key feature of a
Bloom filter then we do not want this to be implemented via exceptions.

The primary use case for fail-fast is to stop as soon as a bit index is
found, or not found (case dependent). Consider a Bloom filter that has 20
indices per hashed item. You have populated the filter with items, each has
20 random indices. You then check if a new item is not contained in the
filter by creating indices for the new item with your hash function and
checking each index against those already in the filter. If your new
element has an index not in the filter, then you have not seen this element

[Collections] Suppliers, Iterables, and Producers

2024-04-28 Thread Gary Gregory
Hi Clause, Albert, and all,

Since the introduction of lambdas in Java 8, Java has a well-defined
terminology around the classic producer-consumer paradigm but (for
reasons unknown to me) realized in the functional interfaces *Supplier
and *Consumer. In addition, as of Java 5, we have the Iterable
interface.

In our new Bloom filter package we have new interfaces called
*Producer (as opposed to *Supplier), where some of these new
interfaces are formally annotated with @FunctionalInterface and some
not (for example, BloomFilterProducer).

My question is: Why call these "Producers" instead of "Suppliers"? Is
the formal Bloom filter literature tied to the "Producer" terminology
in a way that would make adapting to the Java term confusing? I know I
brought up a similar topic recently, but I would like to revisit it
now that I've started to read Claude's blog drafts. Even without
making the current "Producers" formal suppliers by extending Supplier,
would it be worth using the Java terminology?

My second observation is that some might neither be "Producers" or
"Suppliers" but instead be extensions of Iterable. For example,
BitMapProducer is not a factory for instances of BitMap; the BitMap
does not appear in the signatures of BitMapProducer methods. From a
strict Java POV, this is (slightly) perplexing.

Instead (forgetting the class name issue for now), we could have:

@FunctionalInterface
public interface BitMapProducer extends Iterable {...}

Which would let implementations define:

Iterator iterator();

Instead of:

boolean forEachBitMap(LongPredicate predicate);

Same comment for IndexProducer.
Same comment for BloomFilterProducer.
Is this too much Java-ness?

CellConsumer looks like a Predicate, not a traditional Java *Consumer.
We have a specialization called LongBiPredicate so I propose we rename
and extract CellConsumer as IntBiPredicate.

TY!
Gary

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org