Re: Proposal to enhance Stream.collect

2019-03-04 Thread August Nagro
I think documentation could help this (and maybe removing the two added `of` helpers so an explicit collector must be created), but you're right about the cost. If the consensus is not to change Collector, then maybe Tagir's suggestion for internal-only change can be implemented? On Mon, Mar 4,

Re: Proposal to enhance Stream.collect

2019-03-04 Thread Brian Goetz
So, unfortunately I think this particular API evolution idiom is kind of questionable. There is a new method, sizedSupplier, with a default that delegates to the unsized supplier. But, that leaves implementors in a position where they don’t really know which one to implement, and the use of

Re: Proposal to enhance Stream.collect

2019-03-03 Thread August Nagro
Tagir, I slapped my forehead when I saw that StringBuilder's initialCapacity is the number of chars, and not added elements! The CollectorOp class I added for ReferencePipeline.collect(Collector) is so a concurrent + unordered collector may be pre-sized. I thought it was needed for completeness

Re: Proposal to enhance Stream.collect

2019-03-03 Thread Tagir Valeev
Hello August. I'm not supporting the proposed spec change, but I have few comments about implementation (in case OpenJDK reviewers would support it). 1. joining(): Setting initial StringBuilder size to the number of characters which is equivalent to the number of stream elements would be a good

Re: Proposal to enhance Stream.collect

2019-03-03 Thread August Nagro
Hi everyone, My implementation is at https://github.com/AugustNagro/presize-collectors-bench and I have a webrev here: http://august.nagro.us/presized-collectors/webrev/. The changes that I made were: - Add `default IntFunction sizedSupplier()` to Collector - Add 2 new Collector.of helper

Re: Proposal to enhance Stream.collect

2019-02-27 Thread Tagir Valeev
Hello! I wouldn't use presized HashSet, because you never know how many duplicates are expected. What if the input stream has a million of elements, but only two of them are distinct? Do you really want to allocate a hash table for million elements in advance? For toMap() without custom supplier

Re: Proposal to enhance Stream.collect

2019-02-27 Thread August Nagro
Tagir, Great to see the validating benchmarks. > I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it

Re: Proposal to enhance Stream.collect

2019-02-26 Thread Tagir Valeev
Hello! > A less intrusive API direction might be a version of Collector whose > supplier function took a size-estimate argument; this might even help in > parallel since it allows for intermediate results to start with a better > initial size guess. (And this could be implemented as a default

Re: Proposal to enhance Stream.collect

2019-02-24 Thread August Nagro
Thanks for the feedback. I really like Brian's suggestion for a default Collector.supplier(sizeEstimate), since exposing the full stream characteristics is overkill. I don't think the use of this improvement is limited to pre-sizing ArrayLists, though. Some other Collectors that come to mind are

Re: Proposal to enhance Stream.collect

2019-02-24 Thread Brian Goetz
We did consider this problem when designing the Collector API; for example, it would have been nice if we could have a `toArray()` collector that had all the optimizations of `Stream::toArray`. When we looked into it, we found a number of concerning details that caused us to turn back (many

Re: Proposal to enhance Stream.collect

2019-02-24 Thread Rob Spoor
If you already know the size and are not going to parallelize your stream, you can simply use Collectors.toCollection: Stream.of(1,2,3) .map(i -> i+1) .collect(Collectors.toCollection(() -> new ArrayList<>(3))); It could perhaps be a bit easier if Collectors.toList would be

Proposal to enhance Stream.collect

2019-02-23 Thread August Nagro
Calling Stream.collect(Collector) is a popular terminal stream operation. But because the collect methods provide no detail of the stream's characteristics, collectors are not as efficient as they could be. For example, consider a non-parallel, sized stream that is to be collected as a List. This