Re: RFR: JDK-8277095 : Empty streams create too many objects [v2]
On Tue, 16 Nov 2021 20:53:26 GMT, kabutz wrote: >> This is a draft proposal for how we could improve stream performance for the >> case where the streams are empty. Empty collections are common-place. If we >> iterate over them with an Iterator, we would have to create one small >> Iterator object (which could often be eliminated) and if it is empty we are >> done. However, with Streams we first have to build up the entire pipeline, >> until we realize that there is no work to do. With this example, we change >> Collection#stream() to first check if the collection is empty, and if it is, >> we simply return an EmptyStream. We also have EmptyIntStream, >> EmptyLongStream and EmptyDoubleStream. We have taken great care for these to >> have the same characteristics and behaviour as the streams returned by >> Stream.empty(), IntStream.empty(), etc. >> >> Some of the JDK tests fail with this, due to ClassCastExceptions (our >> EmptyStream is not an AbstractPipeline) and AssertionError, since we can >> call some methods repeatedly on the stream without it failing. On the plus >> side, creating a complex stream on an empty stream gives us upwards of 50x >> increase in performance due to a much smaller object allocation rate. This >> PR includes the code for the change, unit tests and also a JMH benchmark to >> demonstrate the improvement. > > kabutz has updated the pull request incrementally with one additional commit > since the last revision: > > Refactored empty stream implementations to reduce duplicate code and > improved unordered() > Added StreamSupport.empty for primitive spliterators and use that in > Arrays.stream() satisfy the bot with a comment - we are on day 20 now :) @kabutz i hope that was ok? - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects [v2]
On Tue, 16 Nov 2021 20:53:26 GMT, kabutz wrote: >> This is a draft proposal for how we could improve stream performance for the >> case where the streams are empty. Empty collections are common-place. If we >> iterate over them with an Iterator, we would have to create one small >> Iterator object (which could often be eliminated) and if it is empty we are >> done. However, with Streams we first have to build up the entire pipeline, >> until we realize that there is no work to do. With this example, we change >> Collection#stream() to first check if the collection is empty, and if it is, >> we simply return an EmptyStream. We also have EmptyIntStream, >> EmptyLongStream and EmptyDoubleStream. We have taken great care for these to >> have the same characteristics and behaviour as the streams returned by >> Stream.empty(), IntStream.empty(), etc. >> >> Some of the JDK tests fail with this, due to ClassCastExceptions (our >> EmptyStream is not an AbstractPipeline) and AssertionError, since we can >> call some methods repeatedly on the stream without it failing. On the plus >> side, creating a complex stream on an empty stream gives us upwards of 50x >> increase in performance due to a much smaller object allocation rate. This >> PR includes the code for the change, unit tests and also a JMH benchmark to >> demonstrate the improvement. > > kabutz has updated the pull request incrementally with one additional commit > since the last revision: > > Refactored empty stream implementations to reduce duplicate code and > improved unordered() > Added StreamSupport.empty for primitive spliterators and use that in > Arrays.stream() Still wondering whether this can ever join the JDK - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects [v2]
On Tue, 16 Nov 2021 20:53:26 GMT, kabutz wrote: >> This is a draft proposal for how we could improve stream performance for the >> case where the streams are empty. Empty collections are common-place. If we >> iterate over them with an Iterator, we would have to create one small >> Iterator object (which could often be eliminated) and if it is empty we are >> done. However, with Streams we first have to build up the entire pipeline, >> until we realize that there is no work to do. With this example, we change >> Collection#stream() to first check if the collection is empty, and if it is, >> we simply return an EmptyStream. We also have EmptyIntStream, >> EmptyLongStream and EmptyDoubleStream. We have taken great care for these to >> have the same characteristics and behaviour as the streams returned by >> Stream.empty(), IntStream.empty(), etc. >> >> Some of the JDK tests fail with this, due to ClassCastExceptions (our >> EmptyStream is not an AbstractPipeline) and AssertionError, since we can >> call some methods repeatedly on the stream without it failing. On the plus >> side, creating a complex stream on an empty stream gives us upwards of 50x >> increase in performance due to a much smaller object allocation rate. This >> PR includes the code for the change, unit tests and also a JMH benchmark to >> demonstrate the improvement. > > kabutz has updated the pull request incrementally with one additional commit > since the last revision: > > Refactored empty stream implementations to reduce duplicate code and > improved unordered() > Added StreamSupport.empty for primitive spliterators and use that in > Arrays.stream() Another comment - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects [v2]
On Tue, 16 Nov 2021 20:53:26 GMT, kabutz wrote: >> This is a draft proposal for how we could improve stream performance for the >> case where the streams are empty. Empty collections are common-place. If we >> iterate over them with an Iterator, we would have to create one small >> Iterator object (which could often be eliminated) and if it is empty we are >> done. However, with Streams we first have to build up the entire pipeline, >> until we realize that there is no work to do. With this example, we change >> Collection#stream() to first check if the collection is empty, and if it is, >> we simply return an EmptyStream. We also have EmptyIntStream, >> EmptyLongStream and EmptyDoubleStream. We have taken great care for these to >> have the same characteristics and behaviour as the streams returned by >> Stream.empty(), IntStream.empty(), etc. >> >> Some of the JDK tests fail with this, due to ClassCastExceptions (our >> EmptyStream is not an AbstractPipeline) and AssertionError, since we can >> call some methods repeatedly on the stream without it failing. On the plus >> side, creating a complex stream on an empty stream gives us upwards of 50x >> increase in performance due to a much smaller object allocation rate. This >> PR includes the code for the change, unit tests and also a JMH benchmark to >> demonstrate the improvement. > > kabutz has updated the pull request incrementally with one additional commit > since the last revision: > > Refactored empty stream implementations to reduce duplicate code and > improved unordered() > Added StreamSupport.empty for primitive spliterators and use that in > Arrays.stream() Add another comment to keep this active. - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects [v2]
> This is a draft proposal for how we could improve stream performance for the > case where the streams are empty. Empty collections are common-place. If we > iterate over them with an Iterator, we would have to create one small > Iterator object (which could often be eliminated) and if it is empty we are > done. However, with Streams we first have to build up the entire pipeline, > until we realize that there is no work to do. With this example, we change > Collection#stream() to first check if the collection is empty, and if it is, > we simply return an EmptyStream. We also have EmptyIntStream, EmptyLongStream > and EmptyDoubleStream. We have taken great care for these to have the same > characteristics and behaviour as the streams returned by Stream.empty(), > IntStream.empty(), etc. > > Some of the JDK tests fail with this, due to ClassCastExceptions (our > EmptyStream is not an AbstractPipeline) and AssertionError, since we can call > some methods repeatedly on the stream without it failing. On the plus side, > creating a complex stream on an empty stream gives us upwards of 50x increase > in performance due to a much smaller object allocation rate. This PR includes > the code for the change, unit tests and also a JMH benchmark to demonstrate > the improvement. kabutz has updated the pull request incrementally with one additional commit since the last revision: Refactored empty stream implementations to reduce duplicate code and improved unordered() Added StreamSupport.empty for primitive spliterators and use that in Arrays.stream() - Changes: - all: https://git.openjdk.java.net/jdk/pull/6275/files - new: https://git.openjdk.java.net/jdk/pull/6275/files/6e60ac6c..5813 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6275&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6275&range=00-01 Stats: 501 lines in 8 files changed: 217 ins; 166 del; 118 mod Patch: https://git.openjdk.java.net/jdk/pull/6275.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6275/head:pull/6275 PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Fri, 5 Nov 2021 12:53:46 GMT, kabutz wrote: > This is a draft proposal for how we could improve stream performance for the > case where the streams are empty. Empty collections are common-place. If we > iterate over them with an Iterator, we would have to create one small > Iterator object (which could often be eliminated) and if it is empty we are > done. However, with Streams we first have to build up the entire pipeline, > until we realize that there is no work to do. With this example, we change > Collection#stream() to first check if the collection is empty, and if it is, > we simply return an EmptyStream. We also have EmptyIntStream, EmptyLongStream > and EmptyDoubleStream. We have taken great care for these to have the same > characteristics and behaviour as the streams returned by Stream.empty(), > IntStream.empty(), etc. > > Some of the JDK tests fail with this, due to ClassCastExceptions (our > EmptyStream is not an AbstractPipeline) and AssertionError, since we can call > some methods repeatedly on the stream without it failing. On the plus side, > creating a complex stream on an empty stream gives us upwards of 50x increase > in performance due to a much smaller object allocation rate. This PR includes > the code for the change, unit tests and also a JMH benchmark to demonstrate > the improvement. > > 1. I have a `PrimitiveIterator` that short circuits `#next` and > > `#forEachRemaining` as well. > > Oh that's a good suggestion - thanks. After looking at this, I decided I'd rather not short-circuit the methods, since they have checking code that I would have to duplicate. - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Mon, 15 Nov 2021 18:23:16 GMT, Philippe Marschall wrote: > > ``` > > 3. I made many methods just return `this` after checking for operated on > > and closed: > > ``` > > Reading the Javadoc again I'm not sure this is allowed. The method Javadoc > doesn't clearly say it but the package Javadoc for intermediate operations > says a new stream is returned. Yes, I also returned "this" initially to avoid object allocation, but fortunately escape analysis helps get rid of them if they are indeed unnecessary. My object allocation for the new empty streams is minimal. The one thing that is still hindering me is the lazy creation of streams. $ jshell | Welcome to JShell -- Version 11.0.12 | For an introduction type: /help intro jshell> var list = new ArrayList() list ==> [] jshell> var stream = list.stream() stream ==> java.util.stream.ReferencePipeline$Head@cb5822 jshell> Collections.addAll(list, 1,2,3) $3 ==> true jshell> stream.forEach(System.out::print) 123 jshell> Does anyone use this "feature"? I hope not, but unfortunately it's the current behavior and we probably have to support it :-( - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Mon, 15 Nov 2021 18:08:22 GMT, Philippe Marschall wrote: > I have a similar project at > [empty-streams](https://github.com/marschall/empty-streams). A couple of > notes: > > 1. I found the need for streams to be stateful. I had need for the following > state: > >1. closed >2. ordered >3. parallel >4. sorted I found the need for ALL the Spliterator characteristics, plus some others that I interleaved between the Spliterator bits. That way I could fit them into a single int. Parallel was too tricky though, so in that case I return a new stream built with the standard StreamSupport(spliterator, true). >5. closeHandler Similarly when someone calls closeHandler(), I also return a new stream built with StreamSupport. >6. comparator (on EmptyStream) Yes, we need this for TreeSet and ConcurrentSkipListSet. I was hoping to avoid it, but there is no way around if we want to be completely correct. > A shared instance can not be used because of `#close`. Agreed. I fixed that. > 2. I have a `PrimitiveIterator` that short circuits `#next` and > `#forEachRemaining` as well. Oh that's a good suggestion - thanks. > 3. I made many methods just return `this` after checking for operated on and > closed: > >1. `#filter` `#map`, `#flatMap`, `#mapMulti`, `#distinct`, `#peek`, > `#limit`, `#skip`, `#dropWhile`, `#takeWhile`. I now always return a new EmptyStream. Fortunately escape analysis gets rid of a lot of objects. >2. These do a state change state as well `#sequential`, `#parallel`, > `#unordered`, `#sorted`, `#onClose`. unordered() only does a state change if it currently ORDERED, otherwise it is correct to return this. The logic is convoluted and weird though. I have a more thorough test case that I need to incorporate into my EmptyStreamTest and which tests all the JDK collections, including the wrapper and immutable collections. > 4. I made my `EmptyBaseStream` implement `BaseStream` and make > `EmptyIntLongDoubleStream` extend from this class as `IntLongDoubleStream` > all extend `BaseStream`. This allowed me to move the following methods up in > the hierarchy `#isParallel` , `#onClose`, `#sequential`, `#parallel`, > `#unordered`. Interesting idea. I'm not sure if that would work for me. I will have a look if I can also do this, but I doubt it. A lot of the methods, especially the primitive ones, are repeats of each other. It might be really nice to put that into a common superclass. > 5. Is there any reason why you make `#parallel` "bail out" to `StreamSupport` > rather than just do a state change? I tried to keep the characteristics identical to what we had before and parallel seemed particularly challenging to get right. I might attempt this again at a later stage, but for now I want to keep it like it is. I don't think parallel() is the most common use case, except with very large collections perhaps. Even though I thought I was careful with the characteristics, I still have a few edge cases I need to iron out. - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Mon, 15 Nov 2021 18:08:22 GMT, Philippe Marschall wrote: > 3. I made many methods just return `this` after checking for operated on > and closed: Reading the Javadoc again I'm not sure this is allowed. The method Javadoc doesn't clearly say it but the package Javadoc for intermediate operations says a new stream is returned. - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Fri, 5 Nov 2021 12:53:46 GMT, kabutz wrote: > This is a draft proposal for how we could improve stream performance for the > case where the streams are empty. Empty collections are common-place. If we > iterate over them with an Iterator, we would have to create one small > Iterator object (which could often be eliminated) and if it is empty we are > done. However, with Streams we first have to build up the entire pipeline, > until we realize that there is no work to do. With this example, we change > Collection#stream() to first check if the collection is empty, and if it is, > we simply return an EmptyStream. We also have EmptyIntStream, EmptyLongStream > and EmptyDoubleStream. We have taken great care for these to have the same > characteristics and behaviour as the streams returned by Stream.empty(), > IntStream.empty(), etc. > > Some of the JDK tests fail with this, due to ClassCastExceptions (our > EmptyStream is not an AbstractPipeline) and AssertionError, since we can call > some methods repeatedly on the stream without it failing. On the plus side, > creating a complex stream on an empty stream gives us upwards of 50x increase > in performance due to a much smaller object allocation rate. This PR includes > the code for the change, unit tests and also a JMH benchmark to demonstrate > the improvement. I have a similar project at [empty-streams](https://github.com/marschall/empty-streams). A couple of notes: 1. I found the need for streams to be stateful. I had need for the following state: 1. closed 2. ordered 3. parallel 4. sorted 5. closeHandler 5. comparator (on EmptyStream) A shared instance can not be used because of `#close`. 2. I have a `PrimitiveIterator` that short circuits `#next` and `#forEachRemaining` as well. 3. I made many methods just return `this` after checking for operated on and closed: 1. `#filter` `#map`, `#flatMap`, `#mapMulti`, `#distinct`, `#peek`, `#limit`, `#skip`, `#dropWhile`, `#takeWhile`. 2. These do a state change state as well `#sequential`, `#parallel`, `#unordered`, `#sorted`, `#onClose`. 4. I made my `EmptyBaseStream` implement `BaseStream` and make `EmptyIntLongDoubleStream` extend from this class as `IntLongDoubleStream` all extend `BaseStream`. This allowed me to move the following methods up in the hierarchy `#isParallel` , `#onClose`, `#sequential`, `#parallel`, `#unordered`. 5. Is there any reason why you make `#parallel` "bail out" to `StreamSupport` rather than just do a state change? - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Sun, 7 Nov 2021 06:53:12 GMT, kabutz wrote: >>> The net effect of this change might depend on your workload. If you call >>> stream() on empty collections that have cheap isEmpty(), this change will >>> likely improve performance and reduce waste. However, this same change >>> might do the opposite if some of your collections aren't empty or have >>> costly isEmpty(). It would be good to have benchmarks for different >>> workloads. >> >> Yes, I also thought about the cost of isEmpty() on concurrent collections. >> There are four concurrent collections that have a linear time cost size() >> method: CLQ, CLD, LTQ and CHM. However, in each of these cases, the >> isEmpty() method has constant time cost. There might be collections defined >> outside the JDK where this could be the case. >> >> However, I will extend the benchmark to include a few of those cases too, as >> well as different sizes and collection sizes. >> >> Thank you so much for your input. > >> wouldn't this make streams no longer lazy if the collection is empty? >> >> ```java >> List list = new ArrayList<>(); >> Stream stream = list.stream(); >> >> list.addAll(List.of("one", "two", "three")); >> >> stream.forEach(System.out::println); // prints one two three >> ``` > > I did not consider this case, thank you for bringing it up. I have always > found this behaviour a bit strange and have never used it "in the real > world". It is also not consistent between collections. Here is an example > with four collections: ArrayList, CopyOnWriteArrayList, ConcurrentSkipListSet > and ArrayBlockingQueue: > > > import java.util.ArrayList; > import java.util.Arrays; > import java.util.Collection; > import java.util.List; > import java.util.Objects; > import java.util.concurrent.ArrayBlockingQueue; > import java.util.concurrent.ConcurrentSkipListSet; > import java.util.concurrent.CopyOnWriteArrayList; > import java.util.function.Supplier; > import java.util.stream.IntStream; > > public class LazyStreamDemo { > public static void main(String... args) { > List>> suppliers = > List.of(ArrayList::new, // fast-fail > CopyOnWriteArrayList::new, // snapshot > ConcurrentSkipListSet::new, // weakly-consistent > () -> new ArrayBlockingQueue<>(10) // > weakly-consistent > ); > for (Supplier> supplier : suppliers) { > Collection c = supplier.get(); > System.out.println(c.getClass()); > IntStream stream = c.stream() > .sorted() > .filter(Objects::nonNull) > .mapToInt(String::length) > .sorted(); > > c.addAll(List.of("one", "two", "three", "four", "five")); > System.out.println("stream = " + > Arrays.toString(stream.toArray())); > } > } > } > > > The output is: > > > class java.util.ArrayList > stream = [3, 3, 4, 4, 5] > class java.util.concurrent.CopyOnWriteArrayList > stream = [] > class java.util.concurrent.ConcurrentSkipListSet > stream = [] > class java.util.concurrent.ArrayBlockingQueue > stream = [3, 3, 4, 4, 5] > > > At least with the EmptyStream we would have consistent output of always [] @kabutz I agree that i wouldn't consider it clean code to use a stream like i put into the example. I only brought it up because it might break existing code, since i think streams are expected to be lazy. Interesting to see that they are in fact not lazy in all situations - i put that into my notes. - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Wed, 10 Nov 2021 07:45:27 GMT, Anthony Vanelverdinghe wrote: >> @kabutz I agree that i wouldn't consider it clean code to use a stream like >> i put into the example. I only brought it up because it might break existing >> code, since i think streams are expected to be lazy. Interesting to see that >> they are in fact not lazy in all situations - i put that into my notes. > > Edit: actually I think the implementation of `Collection::stream` could > simply be changed to: > > > default Stream stream() { > var spliterator = spliterator(); > if(spliterator.hasCharacteristics(Spliterator.IMMUTABLE | > Spliterator.CONCURRENT) && isEmpty()) { > return Stream.empty(); > } > return StreamSupport.stream(spliterator, false); > } > > > Note that the spliterators of methods such as `Collections::emptyList`, > `List::of`, `List::copyOf`, `Set::of`, ... currently don't have the > `IMMUTABLE` characteristic though, so they should be updated. > > --- > > The Javadoc for > [java.util.stream](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/package-summary.html) > is clear though: > >> Traversal of the pipeline source does not begin until the terminal operation >> of the pipeline is executed. > > and further on says: > >> Spliterators for mutable data sources have an additional challenge; timing >> of binding to the data, since the data could change between the time the >> spliterator is created and the time the stream pipeline is executed. >> Ideally, a spliterator for a stream would report a characteristic of >> IMMUTABLE or CONCURRENT; if not it should be late-binding. > > which explains why the collections in `java.util.concurrent` (whose > spliterators have one of those characteristics) need not be late-binding. Thanks @anthonyvdotbe I believe that would satisfy the lazy binding, and then we should increase the use of the IMMUTABLE characteristic where it makes sense. - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Sun, 7 Nov 2021 07:51:06 GMT, Michael Bien wrote: >>> wouldn't this make streams no longer lazy if the collection is empty? >>> >>> ```java >>> List list = new ArrayList<>(); >>> Stream stream = list.stream(); >>> >>> list.addAll(List.of("one", "two", "three")); >>> >>> stream.forEach(System.out::println); // prints one two three >>> ``` >> >> I did not consider this case, thank you for bringing it up. I have always >> found this behaviour a bit strange and have never used it "in the real >> world". It is also not consistent between collections. Here is an example >> with four collections: ArrayList, CopyOnWriteArrayList, >> ConcurrentSkipListSet and ArrayBlockingQueue: >> >> >> import java.util.ArrayList; >> import java.util.Arrays; >> import java.util.Collection; >> import java.util.List; >> import java.util.Objects; >> import java.util.concurrent.ArrayBlockingQueue; >> import java.util.concurrent.ConcurrentSkipListSet; >> import java.util.concurrent.CopyOnWriteArrayList; >> import java.util.function.Supplier; >> import java.util.stream.IntStream; >> >> public class LazyStreamDemo { >> public static void main(String... args) { >> List>> suppliers = >> List.of(ArrayList::new, // fast-fail >> CopyOnWriteArrayList::new, // snapshot >> ConcurrentSkipListSet::new, // weakly-consistent >> () -> new ArrayBlockingQueue<>(10) // >> weakly-consistent >> ); >> for (Supplier> supplier : suppliers) { >> Collection c = supplier.get(); >> System.out.println(c.getClass()); >> IntStream stream = c.stream() >> .sorted() >> .filter(Objects::nonNull) >> .mapToInt(String::length) >> .sorted(); >> >> c.addAll(List.of("one", "two", "three", "four", "five")); >> System.out.println("stream = " + >> Arrays.toString(stream.toArray())); >> } >> } >> } >> >> >> The output is: >> >> >> class java.util.ArrayList >> stream = [3, 3, 4, 4, 5] >> class java.util.concurrent.CopyOnWriteArrayList >> stream = [] >> class java.util.concurrent.ConcurrentSkipListSet >> stream = [] >> class java.util.concurrent.ArrayBlockingQueue >> stream = [3, 3, 4, 4, 5] >> >> >> At least with the EmptyStream we would have consistent output of always [] > > @kabutz I agree that i wouldn't consider it clean code to use a stream like i > put into the example. I only brought it up because it might break existing > code, since i think streams are expected to be lazy. Interesting to see that > they are in fact not lazy in all situations - i put that into my notes. Edit: actually I think the implementation of `Collection::stream` could simply be changed to: default Stream stream() { var spliterator = spliterator(); if(spliterator.hasCharacteristics(Spliterator.IMMUTABLE | Spliterator.CONCURRENT) && isEmpty()) { return Stream.empty(); } return StreamSupport.stream(spliterator, false); } Note that the spliterators of methods such as `Collections::emptyList`, `List::of`, `List::copyOf`, `Set::of`, ... currently don't have the `IMMUTABLE` characteristic though, so they should be updated. --- The Javadoc for [java.util.stream](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/package-summary.html) is clear though: > Traversal of the pipeline source does not begin until the terminal operation > of the pipeline is executed. and further on says: > Spliterators for mutable data sources have an additional challenge; timing of > binding to the data, since the data could change between the time the > spliterator is created and the time the stream pipeline is executed. Ideally, > a spliterator for a stream would report a characteristic of IMMUTABLE or > CONCURRENT; if not it should be late-binding. which explains why the collections in `java.util.concurrent` (whose spliterators have one of those characteristics) need not be late-binding. - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Sun, 7 Nov 2021 06:26:22 GMT, kabutz wrote: >> (immutable collections could override stream() instead, since they don't >> have that problem) > >> The net effect of this change might depend on your workload. If you call >> stream() on empty collections that have cheap isEmpty(), this change will >> likely improve performance and reduce waste. However, this same change might >> do the opposite if some of your collections aren't empty or have costly >> isEmpty(). It would be good to have benchmarks for different workloads. > > Yes, I also thought about the cost of isEmpty() on concurrent collections. > There are four concurrent collections that have a linear time cost size() > method: CLQ, CLD, LTQ and CHM. However, in each of these cases, the isEmpty() > method has constant time cost. There might be collections defined outside the > JDK where this could be the case. > > However, I will extend the benchmark to include a few of those cases too, as > well as different sizes and collection sizes. > > Thank you so much for your input. > wouldn't this make streams no longer lazy if the collection is empty? > > ```java > List list = new ArrayList<>(); > Stream stream = list.stream(); > > list.addAll(List.of("one", "two", "three")); > > stream.forEach(System.out::println); // prints one two three > ``` I did not consider this case, thank you for bringing it up. I have always found this behaviour a bit strange and have never used it "in the real world". It is also not consistent between collections. Here is an example with four collections: ArrayList, CopyOnWriteArrayList, ConcurrentSkipListSet and ArrayBlockingQueue: import java.util.ArrayList; import java.util.Arrays; import java.util.Collection; import java.util.List; import java.util.Objects; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.ConcurrentSkipListSet; import java.util.concurrent.CopyOnWriteArrayList; import java.util.function.Supplier; import java.util.stream.IntStream; public class LazyStreamDemo { public static void main(String... args) { List>> suppliers = List.of(ArrayList::new, // fast-fail CopyOnWriteArrayList::new, // snapshot ConcurrentSkipListSet::new, // weakly-consistent () -> new ArrayBlockingQueue<>(10) // weakly-consistent ); for (Supplier> supplier : suppliers) { Collection c = supplier.get(); System.out.println(c.getClass()); IntStream stream = c.stream() .sorted() .filter(Objects::nonNull) .mapToInt(String::length) .sorted(); c.addAll(List.of("one", "two", "three", "four", "five")); System.out.println("stream = " + Arrays.toString(stream.toArray())); } } } The output is: class java.util.ArrayList stream = [3, 3, 4, 4, 5] class java.util.concurrent.CopyOnWriteArrayList stream = [] class java.util.concurrent.ConcurrentSkipListSet stream = [] class java.util.concurrent.ArrayBlockingQueue stream = [3, 3, 4, 4, 5] At least with the EmptyStream we would have consistent output of always [] - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Sun, 7 Nov 2021 04:26:13 GMT, Michael Bien wrote: >> wouldn't this make streams no longer lazy if the collection is empty? >> >> List list = new ArrayList<>(); >> Stream stream = list.stream(); >> >> list.addAll(List.of("one", "two", "three")); >> >> stream.forEach(System.out::println); // prints one two three > > (immutable collections could override stream() instead, since they don't have > that problem) > The net effect of this change might depend on your workload. If you call > stream() on empty collections that have cheap isEmpty(), this change will > likely improve performance and reduce waste. However, this same change might > do the opposite if some of your collections aren't empty or have costly > isEmpty(). It would be good to have benchmarks for different workloads. Yes, I also thought about the cost of isEmpty() on concurrent collections. There are four concurrent collections that have a linear time cost size() method: CLQ, CLD, LTQ and CHM. However, in each of these cases, the isEmpty() method has constant time cost. There might be collections defined outside the JDK where this could be the case. However, I will extend the benchmark to include a few of those cases too, as well as different sizes and collection sizes. Thank you so much for your input. - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Sat, 6 Nov 2021 22:03:26 GMT, Pavel Rappo wrote: >> This is a draft proposal for how we could improve stream performance for the >> case where the streams are empty. Empty collections are common-place. If we >> iterate over them with an Iterator, we would have to create one small >> Iterator object (which could often be eliminated) and if it is empty we are >> done. However, with Streams we first have to build up the entire pipeline, >> until we realize that there is no work to do. With this example, we change >> Collection#stream() to first check if the collection is empty, and if it is, >> we simply return an EmptyStream. We also have EmptyIntStream, >> EmptyLongStream and EmptyDoubleStream. We have taken great care for these to >> have the same characteristics and behaviour as the streams returned by >> Stream.empty(), IntStream.empty(), etc. >> >> Some of the JDK tests fail with this, due to ClassCastExceptions (our >> EmptyStream is not an AbstractPipeline) and AssertionError, since we can >> call some methods repeatedly on the stream without it failing. On the plus >> side, creating a complex stream on an empty stream gives us upwards of 50x >> increase in performance due to a much smaller object allocation rate. This >> PR includes the code for the change, unit tests and also a JMH benchmark to >> demonstrate the improvement. > > src/java.base/share/classes/java/util/Collection.java line 743: > >> 741: */ >> 742: default Stream stream() { >> 743: if (isEmpty()) return Stream.empty(); > > The net effect of this change might depend on your workload. If you call > stream() on empty collections that have cheap isEmpty(), this change will > likely improve performance and reduce waste. However, this same change might > do the opposite if some of your collections aren't empty or have costly > isEmpty(). It would be good to have benchmarks for different workloads. wouldn't this make streams no longer lazy if the collection is empty? List list = new ArrayList<>(); Stream stream = list.stream(); list.addAll(List.of("one", "two", "three")); stream.forEach(System.out::println); // prints one two three - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Sun, 7 Nov 2021 03:53:29 GMT, Michael Bien wrote: >> src/java.base/share/classes/java/util/Collection.java line 743: >> >>> 741: */ >>> 742: default Stream stream() { >>> 743: if (isEmpty()) return Stream.empty(); >> >> The net effect of this change might depend on your workload. If you call >> stream() on empty collections that have cheap isEmpty(), this change will >> likely improve performance and reduce waste. However, this same change might >> do the opposite if some of your collections aren't empty or have costly >> isEmpty(). It would be good to have benchmarks for different workloads. > > wouldn't this make streams no longer lazy if the collection is empty? > > List list = new ArrayList<>(); > Stream stream = list.stream(); > > list.addAll(List.of("one", "two", "three")); > > stream.forEach(System.out::println); // prints one two three (immutable collections could override stream() instead, since they don't have that problem) - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Sat, 13 Nov 2021 16:59:10 GMT, liach wrote: >> This is a draft proposal for how we could improve stream performance for the >> case where the streams are empty. Empty collections are common-place. If we >> iterate over them with an Iterator, we would have to create one small >> Iterator object (which could often be eliminated) and if it is empty we are >> done. However, with Streams we first have to build up the entire pipeline, >> until we realize that there is no work to do. With this example, we change >> Collection#stream() to first check if the collection is empty, and if it is, >> we simply return an EmptyStream. We also have EmptyIntStream, >> EmptyLongStream and EmptyDoubleStream. We have taken great care for these to >> have the same characteristics and behaviour as the streams returned by >> Stream.empty(), IntStream.empty(), etc. >> >> Some of the JDK tests fail with this, due to ClassCastExceptions (our >> EmptyStream is not an AbstractPipeline) and AssertionError, since we can >> call some methods repeatedly on the stream without it failing. On the plus >> side, creating a complex stream on an empty stream gives us upwards of 50x >> increase in performance due to a much smaller object allocation rate. This >> PR includes the code for the change, unit tests and also a JMH benchmark to >> demonstrate the improvement. > > src/java.base/share/classes/java/util/Arrays.java line 5448: > >> 5446: public static Stream stream(T[] array, int startInclusive, >> int endExclusive) { >> 5447: var spliterator = spliterator(array, startInclusive, >> endExclusive); >> 5448: if (startInclusive == endExclusive) return Stream.empty(); > > Can't we just add the `if` statement to before the `spliterator` is computed? Thanks @liach for the suggestion. The spliterator serves the purpose of checking the arguments. For example, if array is null, the method should throw a NPE. Similarly, startInclusive and endExclusive have to be in the range of the array length. If we swap around the lines, someone could call stream(null, -100, -100) and it would happily return an empty stream. Furthermore, the empty streams can inherit characteristics of a spliterator. In the code you pointed out, I don't do that, but will change that. It makes it much easier to keep the empty streams consistent with their previous behavior. One might argue that it is pointless to call distinct() sorted() and unordered() on an empty stream and expect it to change, but it was easy enough to implement so that we can keep consistency FWIW. - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Fri, 5 Nov 2021 12:53:46 GMT, kabutz wrote: > This is a draft proposal for how we could improve stream performance for the > case where the streams are empty. Empty collections are common-place. If we > iterate over them with an Iterator, we would have to create one small > Iterator object (which could often be eliminated) and if it is empty we are > done. However, with Streams we first have to build up the entire pipeline, > until we realize that there is no work to do. With this example, we change > Collection#stream() to first check if the collection is empty, and if it is, > we simply return an EmptyStream. We also have EmptyIntStream, EmptyLongStream > and EmptyDoubleStream. We have taken great care for these to have the same > characteristics and behaviour as the streams returned by Stream.empty(), > IntStream.empty(), etc. > > Some of the JDK tests fail with this, due to ClassCastExceptions (our > EmptyStream is not an AbstractPipeline) and AssertionError, since we can call > some methods repeatedly on the stream without it failing. On the plus side, > creating a complex stream on an empty stream gives us upwards of 50x increase > in performance due to a much smaller object allocation rate. This PR includes > the code for the change, unit tests and also a JMH benchmark to demonstrate > the improvement. src/java.base/share/classes/java/util/Arrays.java line 5448: > 5446: public static Stream stream(T[] array, int startInclusive, > int endExclusive) { > 5447: var spliterator = spliterator(array, startInclusive, > endExclusive); > 5448: if (startInclusive == endExclusive) return Stream.empty(); Can't we just add the `if` statement to before the `spliterator` is computed? - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Fri, 5 Nov 2021 12:53:46 GMT, kabutz wrote: > This is a draft proposal for how we could improve stream performance for the > case where the streams are empty. Empty collections are common-place. If we > iterate over them with an Iterator, we would have to create one small > Iterator object (which could often be eliminated) and if it is empty we are > done. However, with Streams we first have to build up the entire pipeline, > until we realize that there is no work to do. With this example, we change > Collection#stream() to first check if the collection is empty, and if it is, > we simply return an EmptyStream. We also have EmptyIntStream, EmptyLongStream > and EmptyDoubleStream. We have taken great care for these to have the same > characteristics and behaviour as the streams returned by Stream.empty(), > IntStream.empty(), etc. > > Some of the JDK tests fail with this, due to ClassCastExceptions (our > EmptyStream is not an AbstractPipeline) and AssertionError, since we can call > some methods repeatedly on the stream without it failing. On the plus side, > creating a complex stream on an empty stream gives us upwards of 50x increase > in performance due to a much smaller object allocation rate. This PR includes > the code for the change, unit tests and also a JMH benchmark to demonstrate > the improvement. I've added far more JMH tests to check for stream size of [0, 1, 10, 100], then different types of streams, from minimal to complex, then five different collections ArrayList, ConcurrentLinkedQueue, ConcurrentSkipListSet, CopyOnWriteArrayList, ConcurrentHashMap.newKeySet() and lastly with Stream.empty() that has the new behavior and StreamSupport.stream(...) that was the old behavior. With all this multiplication of test options, it will take a couple of days to run on my server. Will let you know if anything surprising pops up. Thanks for your excellent suggestions. It seemed that we cannot get away from making some objects. I've got a new version in the works that makes EmptyStream instances each time, with their own state. The performance is still good. For a simple stream pipeline, it is roughly twice as fast for an empty stream. There is no noticeable slow-down for non-empty streams. `EmptyStreams` now follow the same behavior as we would get if we created the stream with `StreamSupport.stream(collection.spliterator(), false)`. For example, we cannot call `filter()` / `map()` etc. twice on the same stream. We can call `unordered()` twice on the same stream, but only if it was not `ORDERED` to begin with. Streams are not created lazily yet in my updated version, but I'm hoping it is a step in the right direction. I'm running an extensive JMH suite overnight to compare object allocation rates and throughput for the streams. Here are the maximum throughput numbers for the JMH benchmark for various lengths, collections, type of stream decorations and whether it is the old style stream creation or the new EmptyStream. We also see the speedup with the new system. There are some strange effects where we see small differences once we get past 0 length, which is most likely to be explained because of noise in the benchmark. The speedup in the benchmark for empty streams seems to be from about 2x for minimal stream decoration through to about 9x for the more complex kind. a_length, b_typeOfCollection, c_typeOfStreamDecoration, d_streamCreation, max op/ms, speedup 0, ArrayList, minimal, old, 20972.400 0, ArrayList, minimal, new, 44866.020, 2.13x 0, ArrayList, basic, old, 19219.077 0, ArrayList, basic, new, 47574.102, 2.47x 0, ArrayList, complex, old, 9425.247 0, ArrayList, complex, new, 44850.655, 4.75x 0, ArrayList, crossover, old, 4749.495 0, ArrayList, crossover, new, 44550.110, 9.37x 0, ConcurrentLinkedQueue, minimal, old, 20146.717 0, ConcurrentLinkedQueue, minimal, new, 36154.586, 1.79x 0, ConcurrentLinkedQueue, basic, old, 18107.648 0, ConcurrentLinkedQueue, basic, new, 36094.319, 1.99x 0, ConcurrentLinkedQueue, complex, old, 9033.936 0, ConcurrentLinkedQueue, complex, new, 36089.520, 3.99x 0, ConcurrentLinkedQueue, crossover, old, 4555.928 0, ConcurrentLinkedQueue, crossover, new, 35932.132, 7.88x 0, ConcurrentSkipListSet, minimal, old, 20391.479 0, ConcurrentSkipListSet, minimal, new, 40259.694, 1.97x 0, ConcurrentSkipListSet, basic, old, 18616.301 0, ConcurrentSkipListSet, basic, new, 40914.132, 2.19x 0, ConcurrentSkipListSet, complex, old, 9853.569 0, ConcurrentSkipListSet, complex, new, 40150.606, 4.07x 0, ConcurrentSkipListSet, crossover, old, 4632.691 0, ConcurrentSkipListSet, crossover, new, 40151.534, 8.66x 0, CopyOnWriteArrayList, minimal, old, 19952.257 0, CopyOnWriteArrayList, minimal, new, 36884.796, 1.84x 0, CopyOnWriteArrayList, basic, old, 18228.390 0, CopyOnWriteArrayList, basic, new, 36993.146, 2.02x 0, CopyOnWriteArrayList, complex, old, 9635.404 0, CopyOnWriteArrayList, complex, new, 36862.714, 3.82x 0, CopyOnWriteArrayList,
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Fri, 5 Nov 2021 12:53:46 GMT, kabutz wrote: > This is a draft proposal for how we could improve stream performance for the > case where the streams are empty. Empty collections are common-place. If we > iterate over them with an Iterator, we would have to create one small > Iterator object (which could often be eliminated) and if it is empty we are > done. However, with Streams we first have to build up the entire pipeline, > until we realize that there is no work to do. With this example, we change > Collection#stream() to first check if the collection is empty, and if it is, > we simply return an EmptyStream. We also have EmptyIntStream, EmptyLongStream > and EmptyDoubleStream. We have taken great care for these to have the same > characteristics and behaviour as the streams returned by Stream.empty(), > IntStream.empty(), etc. > > Some of the JDK tests fail with this, due to ClassCastExceptions (our > EmptyStream is not an AbstractPipeline) and AssertionError, since we can call > some methods repeatedly on the stream without it failing. On the plus side, > creating a complex stream on an empty stream gives us upwards of 50x increase > in performance due to a much smaller object allocation rate. This PR includes > the code for the change, unit tests and also a JMH benchmark to demonstrate > the improvement. Looks a good idea to reduce the pipeline overhead ! - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Sat, 6 Nov 2021 17:23:34 GMT, Pavel Rappo wrote: > Streams are closeable, and a terminal operation may be invoked on a given > stream only once. Thus, shouldn't the third line in both of the examples > below throw `IllegalStateException`? > > ``` > Stream empty = Stream.empty(); > System.out.println(empty.count()); > System.out.println(empty.count()); > > Stream empty = Stream.empty(); > empty.close(); > System.out.println(empty.count()); > ``` That would be fairly easy to solve by having two instances of the EmptyStream. The terminal operations would return the terminal operation that throws IllegalStateExceptions. - PR: https://git.openjdk.java.net/jdk/pull/6275
Re: RFR: JDK-8277095 : Empty streams create too many objects
On Fri, 5 Nov 2021 12:53:46 GMT, kabutz wrote: > This is a draft proposal for how we could improve stream performance for the > case where the streams are empty. Empty collections are common-place. If we > iterate over them with an Iterator, we would have to create one small > Iterator object (which could often be eliminated) and if it is empty we are > done. However, with Streams we first have to build up the entire pipeline, > until we realize that there is no work to do. With this example, we change > Collection#stream() to first check if the collection is empty, and if it is, > we simply return an EmptyStream. We also have EmptyIntStream, EmptyLongStream > and EmptyDoubleStream. We have taken great care for these to have the same > characteristics and behaviour as the streams returned by Stream.empty(), > IntStream.empty(), etc. > > Some of the JDK tests fail with this, due to ClassCastExceptions (our > EmptyStream is not an AbstractPipeline) and AssertionError, since we can call > some methods repeatedly on the stream without it failing. On the plus side, > creating a complex stream on an empty stream gives us upwards of 50x increase > in performance due to a much smaller object allocation rate. This PR includes > the code for the change, unit tests and also a JMH benchmark to demonstrate > the improvement. Streams are closeable, and a terminal operation may be invoked on a given stream only once. Thus, shouldn't the third line in both of the examples below throw `IllegalStateException`? Stream empty = Stream.empty(); System.out.println(empty.count()); System.out.println(empty.count()); Stream empty = Stream.empty(); empty.close(); System.out.println(empty.count()); I don't think that we can remove all the state from an empty stream, but we can likely make it smaller. src/java.base/share/classes/java/util/Collection.java line 743: > 741: */ > 742: default Stream stream() { > 743: if (isEmpty()) return Stream.empty(); The net effect of this change might depend on your workload. If you call stream() on empty collections that have cheap isEmpty(), this change will likely improve performance and reduce waste. However, this same change might do the opposite if some of your collections aren't empty or have costly isEmpty(). It would be good to have benchmarks for different workloads. - PR: https://git.openjdk.java.net/jdk/pull/6275