Re: Next Druid release version scheme
I'd say yes, in a way that's similar to today. Today we treat increments of the version after the 0 as potentially allowing breaking changes. We also try to avoid them whenever feasible, because we know they're painful for users. I'm not suggesting we immediately get any more, or less, eager about making breaking changes as part of dropping the "0.". Over time, though, I'd like to see us get less eager about making breaking changes. On Wed, Jul 6, 2022 at 9:47 AM Julian Hyde wrote: > Would 24.0 and 25.0 each be regarded as major versions for the purposes of > semantic versioning? > > If so, under the rules of semantic versioning, we *can* make breaking API > changes but that doesn’t mean that we *should*. (For an example of a > project that followed the letter of semantic versioning but still > undermined the trust of their users by making too many API changes, look no > further than Guava.) > > Julian > > > On Jul 6, 2022, at 1:53 AM, Gian Merlino wrote: > > My proposal for the next release is that we merely drop the leading "0." > and don't change anything else about our dev process. We'd start the next > release at 24.0, and then likely do 25.0 shortly after. Same as today, just > no leading '0.". > > Separately, I'd like to craft a better versioning story around extension > API, query API, etc. But I don't think we need to connect these two things. > The dropping of the leading "0." is mainly about reflecting the reality > that the project is way more stable than a random member of the public > would expect for a "0." release. The better versioning story is an effort > that is independent from that. > > On Tue, Jun 7, 2022 at 11:50 AM Xavier Léauté > > wrote: > > Extension API: do extensions written for version X run as expected with > > version Y? > > One thing I'd like to see us do before we declare to 1.0 and provide > backwards compatibility for extensions APIs is > to remove some of the crufty Hadoop 2.x and Guava 16 dependency constraints > we have (or at least isolate them so > extensions and core are not constrained by old versions). Removing those > will likely be a breaking change for extensions. > > I'm also fine declaring 1.0, but that might mean we can't deprecate things > until 2.0, and then remove those in 3.0 depending on > what our backwards compatibility guarantees are. What I'd like us to avoid > is to be further entrenched and bogged down in > moving away from those dependencies by declaring a stable API. > > Xavier > > On Mon, Jun 6, 2022 at 2:45 PM rahul gidwani > wrote: > > Hi Gian, this is great. > > For me what is most important is (2) and (4) > Does my current extension work with new releases? > Can I do a rolling upgrade of druid to the next version? > > The more things that are versioned the better, but (2) and (4) have been > the things that have been most important to me in the past. > > Anyone in the community have any thoughts on this? > Thank you > rahul > > > > On Fri, May 27, 2022 at 11:22 AM Gian Merlino wrote: > > Yeah, I'd say the next one after 24.0 would be 25.0. The idea is really > just to remove the leading zero and thereby communicate the accurate > > state > > of the project: it has been stable and production-ready for a long > > time. > > Some people see the leading zero and interpret that as a sign of an > immature or non-production-ready system. So I think this change is > > worth > > doing and beneficial. > > I do think we can do better at communicating compatibility, but IMO > semantic versioning for the whole system isn't the best way to do it. > Semantic versioning is good for libraries, where people need one kind > > of > > assurance: that they can update to the latest version of the library > without needing to make changes in their program. But Druid is > infrastructure software with many varied senses of compatibility, such > > as: > > > 1) Query API: do user queries written for version X return compatible > responses when run against version Y? > 2) Extension API: do extensions written for version X run as expected > > with > > version Y? > 3) Storage format: can servers at version X read segments written by > servers at version Y? > 4) Intracluster protocol: can a server at version X communicate > > properly > > with a server at version Y? > 5) Server configuration: do server configurations (runtime properties, > > jvm > > configs) written for version X work as expected for version Y? > 6) Ecosystem: does version Y drop support for older versions of > > ZooKeeper, > > Kafka, Hadoop, etc, which were supported by version X? > > In practice we do find good reasons to make such changes in one or more > > of > > these areas in many of our releases. We try to maximize compatibility > between releases, but it is balanced against the effort to improve the > system while keeping the code maintainable. So if we considered all of > these areas in semantic versioning, we'd be incrementing the major > > version > > often anyway. The effect would be
Re: Next Druid release version scheme
+1 to what Julian said. On Wed, Jul 6, 2022 at 9:47 AM Julian Hyde wrote: > Would 24.0 and 25.0 each be regarded as major versions for the purposes of > semantic versioning? > > If so, under the rules of semantic versioning, we *can* make breaking API > changes but that doesn’t mean that we *should*. (For an example of a > project that followed the letter of semantic versioning but still > undermined the trust of their users by making too many API changes, look no > further than Guava.) > > Julian > > > On Jul 6, 2022, at 1:53 AM, Gian Merlino wrote: > > My proposal for the next release is that we merely drop the leading "0." > and don't change anything else about our dev process. We'd start the next > release at 24.0, and then likely do 25.0 shortly after. Same as today, just > no leading '0.". > > Separately, I'd like to craft a better versioning story around extension > API, query API, etc. But I don't think we need to connect these two things. > The dropping of the leading "0." is mainly about reflecting the reality > that the project is way more stable than a random member of the public > would expect for a "0." release. The better versioning story is an effort > that is independent from that. > > On Tue, Jun 7, 2022 at 11:50 AM Xavier Léauté > > wrote: > > Extension API: do extensions written for version X run as expected with > > version Y? > > One thing I'd like to see us do before we declare to 1.0 and provide > backwards compatibility for extensions APIs is > to remove some of the crufty Hadoop 2.x and Guava 16 dependency constraints > we have (or at least isolate them so > extensions and core are not constrained by old versions). Removing those > will likely be a breaking change for extensions. > > I'm also fine declaring 1.0, but that might mean we can't deprecate things > until 2.0, and then remove those in 3.0 depending on > what our backwards compatibility guarantees are. What I'd like us to avoid > is to be further entrenched and bogged down in > moving away from those dependencies by declaring a stable API. > > Xavier > > On Mon, Jun 6, 2022 at 2:45 PM rahul gidwani > wrote: > > Hi Gian, this is great. > > For me what is most important is (2) and (4) > Does my current extension work with new releases? > Can I do a rolling upgrade of druid to the next version? > > The more things that are versioned the better, but (2) and (4) have been > the things that have been most important to me in the past. > > Anyone in the community have any thoughts on this? > Thank you > rahul > > > > On Fri, May 27, 2022 at 11:22 AM Gian Merlino wrote: > > Yeah, I'd say the next one after 24.0 would be 25.0. The idea is really > just to remove the leading zero and thereby communicate the accurate > > state > > of the project: it has been stable and production-ready for a long > > time. > > Some people see the leading zero and interpret that as a sign of an > immature or non-production-ready system. So I think this change is > > worth > > doing and beneficial. > > I do think we can do better at communicating compatibility, but IMO > semantic versioning for the whole system isn't the best way to do it. > Semantic versioning is good for libraries, where people need one kind > > of > > assurance: that they can update to the latest version of the library > without needing to make changes in their program. But Druid is > infrastructure software with many varied senses of compatibility, such > > as: > > > 1) Query API: do user queries written for version X return compatible > responses when run against version Y? > 2) Extension API: do extensions written for version X run as expected > > with > > version Y? > 3) Storage format: can servers at version X read segments written by > servers at version Y? > 4) Intracluster protocol: can a server at version X communicate > > properly > > with a server at version Y? > 5) Server configuration: do server configurations (runtime properties, > > jvm > > configs) written for version X work as expected for version Y? > 6) Ecosystem: does version Y drop support for older versions of > > ZooKeeper, > > Kafka, Hadoop, etc, which were supported by version X? > > In practice we do find good reasons to make such changes in one or more > > of > > these areas in many of our releases. We try to maximize compatibility > between releases, but it is balanced against the effort to improve the > system while keeping the code maintainable. So if we considered all of > these areas in semantic versioning, we'd be incrementing the major > > version > > often anyway. The effect would be similar to having a "meaningless" > > version > > number but with more steps. > > IMO a better approach would be to introduce more kinds of version > > numbers. > > In my experience the two most important kinds of compatibility to most > users are "Query API" and "Extension API". So if we had a "Query API > version" or "Extension API version" then we could semantically version > > the > > Query and Extension
Re: Next Druid release version scheme
Would 24.0 and 25.0 each be regarded as major versions for the purposes of semantic versioning? If so, under the rules of semantic versioning, we *can* make breaking API changes but that doesn’t mean that we *should*. (For an example of a project that followed the letter of semantic versioning but still undermined the trust of their users by making too many API changes, look no further than Guava.) Julian On Jul 6, 2022, at 1:53 AM, Gian Merlino wrote: My proposal for the next release is that we merely drop the leading "0." and don't change anything else about our dev process. We'd start the next release at 24.0, and then likely do 25.0 shortly after. Same as today, just no leading '0.". Separately, I'd like to craft a better versioning story around extension API, query API, etc. But I don't think we need to connect these two things. The dropping of the leading "0." is mainly about reflecting the reality that the project is way more stable than a random member of the public would expect for a "0." release. The better versioning story is an effort that is independent from that. On Tue, Jun 7, 2022 at 11:50 AM Xavier Léauté wrote: Extension API: do extensions written for version X run as expected with version Y? One thing I'd like to see us do before we declare to 1.0 and provide backwards compatibility for extensions APIs is to remove some of the crufty Hadoop 2.x and Guava 16 dependency constraints we have (or at least isolate them so extensions and core are not constrained by old versions). Removing those will likely be a breaking change for extensions. I'm also fine declaring 1.0, but that might mean we can't deprecate things until 2.0, and then remove those in 3.0 depending on what our backwards compatibility guarantees are. What I'd like us to avoid is to be further entrenched and bogged down in moving away from those dependencies by declaring a stable API. Xavier On Mon, Jun 6, 2022 at 2:45 PM rahul gidwani wrote: Hi Gian, this is great. For me what is most important is (2) and (4) Does my current extension work with new releases? Can I do a rolling upgrade of druid to the next version? The more things that are versioned the better, but (2) and (4) have been the things that have been most important to me in the past. Anyone in the community have any thoughts on this? Thank you rahul On Fri, May 27, 2022 at 11:22 AM Gian Merlino wrote: Yeah, I'd say the next one after 24.0 would be 25.0. The idea is really just to remove the leading zero and thereby communicate the accurate state of the project: it has been stable and production-ready for a long time. Some people see the leading zero and interpret that as a sign of an immature or non-production-ready system. So I think this change is worth doing and beneficial. I do think we can do better at communicating compatibility, but IMO semantic versioning for the whole system isn't the best way to do it. Semantic versioning is good for libraries, where people need one kind of assurance: that they can update to the latest version of the library without needing to make changes in their program. But Druid is infrastructure software with many varied senses of compatibility, such as: 1) Query API: do user queries written for version X return compatible responses when run against version Y? 2) Extension API: do extensions written for version X run as expected with version Y? 3) Storage format: can servers at version X read segments written by servers at version Y? 4) Intracluster protocol: can a server at version X communicate properly with a server at version Y? 5) Server configuration: do server configurations (runtime properties, jvm configs) written for version X work as expected for version Y? 6) Ecosystem: does version Y drop support for older versions of ZooKeeper, Kafka, Hadoop, etc, which were supported by version X? In practice we do find good reasons to make such changes in one or more of these areas in many of our releases. We try to maximize compatibility between releases, but it is balanced against the effort to improve the system while keeping the code maintainable. So if we considered all of these areas in semantic versioning, we'd be incrementing the major version often anyway. The effect would be similar to having a "meaningless" version number but with more steps. IMO a better approach would be to introduce more kinds of version numbers. In my experience the two most important kinds of compatibility to most users are "Query API" and "Extension API". So if we had a "Query API version" or "Extension API version" then we could semantically version the Query and Extension API versions, separately from the main Druid version. (Each Druid release would have an associated Extension API version, and a list of supported Query API versions that users could choose between on a per-query basis.) Rahul, I wonder what you think about this idea? What kinds of compatibility are most important to
Re: Next Druid release version scheme
My proposal for the next release is that we merely drop the leading "0." and don't change anything else about our dev process. We'd start the next release at 24.0, and then likely do 25.0 shortly after. Same as today, just no leading '0.". Separately, I'd like to craft a better versioning story around extension API, query API, etc. But I don't think we need to connect these two things. The dropping of the leading "0." is mainly about reflecting the reality that the project is way more stable than a random member of the public would expect for a "0." release. The better versioning story is an effort that is independent from that. On Tue, Jun 7, 2022 at 11:50 AM Xavier Léauté wrote: > > Extension API: do extensions written for version X run as expected with > version Y? > > One thing I'd like to see us do before we declare to 1.0 and provide > backwards compatibility for extensions APIs is > to remove some of the crufty Hadoop 2.x and Guava 16 dependency constraints > we have (or at least isolate them so > extensions and core are not constrained by old versions). Removing those > will likely be a breaking change for extensions. > > I'm also fine declaring 1.0, but that might mean we can't deprecate things > until 2.0, and then remove those in 3.0 depending on > what our backwards compatibility guarantees are. What I'd like us to avoid > is to be further entrenched and bogged down in > moving away from those dependencies by declaring a stable API. > > Xavier > > On Mon, Jun 6, 2022 at 2:45 PM rahul gidwani > wrote: > > > Hi Gian, this is great. > > > > For me what is most important is (2) and (4) > > Does my current extension work with new releases? > > Can I do a rolling upgrade of druid to the next version? > > > > The more things that are versioned the better, but (2) and (4) have been > > the things that have been most important to me in the past. > > > > Anyone in the community have any thoughts on this? > > Thank you > > rahul > > > > > > > > On Fri, May 27, 2022 at 11:22 AM Gian Merlino wrote: > > > > > Yeah, I'd say the next one after 24.0 would be 25.0. The idea is really > > > just to remove the leading zero and thereby communicate the accurate > > state > > > of the project: it has been stable and production-ready for a long > time. > > > Some people see the leading zero and interpret that as a sign of an > > > immature or non-production-ready system. So I think this change is > worth > > > doing and beneficial. > > > > > > I do think we can do better at communicating compatibility, but IMO > > > semantic versioning for the whole system isn't the best way to do it. > > > Semantic versioning is good for libraries, where people need one kind > of > > > assurance: that they can update to the latest version of the library > > > without needing to make changes in their program. But Druid is > > > infrastructure software with many varied senses of compatibility, such > > as: > > > > > > 1) Query API: do user queries written for version X return compatible > > > responses when run against version Y? > > > 2) Extension API: do extensions written for version X run as expected > > with > > > version Y? > > > 3) Storage format: can servers at version X read segments written by > > > servers at version Y? > > > 4) Intracluster protocol: can a server at version X communicate > properly > > > with a server at version Y? > > > 5) Server configuration: do server configurations (runtime properties, > > jvm > > > configs) written for version X work as expected for version Y? > > > 6) Ecosystem: does version Y drop support for older versions of > > ZooKeeper, > > > Kafka, Hadoop, etc, which were supported by version X? > > > > > > In practice we do find good reasons to make such changes in one or more > > of > > > these areas in many of our releases. We try to maximize compatibility > > > between releases, but it is balanced against the effort to improve the > > > system while keeping the code maintainable. So if we considered all of > > > these areas in semantic versioning, we'd be incrementing the major > > version > > > often anyway. The effect would be similar to having a "meaningless" > > version > > > number but with more steps. > > > > > > IMO a better approach would be to introduce more kinds of version > > numbers. > > > In my experience the two most important kinds of compatibility to most > > > users are "Query API" and "Extension API". So if we had a "Query API > > > version" or "Extension API version" then we could semantically version > > the > > > Query and Extension API versions, separately from the main Druid > version. > > > (Each Druid release would have an associated Extension API version, > and a > > > list of supported Query API versions that users could choose between > on a > > > per-query basis.) > > > > > > Rahul, I wonder what you think about this idea? What kinds of > > compatibility > > > are most important to you? > > > > > > On Fri, May 27, 2022 at 9:39 AM rahul gidwani > wrote:
Re: [DISCUSS] Removing code related to `FireHose`
I am in favor of immediately removing FiniteFirehoseFactory and marking EventReceiverFirehoseFactory deprecated. Then, later on we can remove InputRowParser and EventReceiverFirehoseFactory. On Fri, Jun 24, 2022 at 4:41 AM Abhishek Agarwal wrote: > I didn’t include them (RealtimeIndexTask and > AppenderatorDriverRealtimeIndexTask) in my previous email because they have > not been marked deprecated yet. We should mark them deprecated officially > in the next release and remove them in the release after that. > > So looks like the classes that we can definitely remove are implementations > of `FiniteFirehoseFactory` and mark the `Firehose` interface deprecated. > > On Fri, 24 Jun 2022 at 4:36 AM, Clint Wylie wrote: > > > If we remove RealtimeIndexTask and AppenderatorDriverRealtimeIndexTask > > then we can remove EventReceiverFirehoseFactory. The former was > > primarily used by tranquility which has been sunset, the latter I'm > > not sure was ever used for anything. I'm personally in favor of > > removing both of them since push based ingestion is very fragile in my > > experience, but I think some of the oldest integration tests use > > RealtimeIndexTask and so would need to be removed/updated/rewritten to > > use something else as appropriate. > > > > I don't think we can completely remove InputRowParser until we drop > > Hadoop support (or modify Hadoop ingestion to use > > InputSource/InputFormat?), since it still relies on using the older > > spec. As far as I know, Thrift is the only data format that has not > > been fully migrated to use InputFormat, though there is an old PR that > > is mostly done here https://github.com/apache/druid/pull/11360. > > > > On Thu, Jun 23, 2022 at 5:11 AM Abhishek Agarwal > > wrote: > > > > > > Hello, > > > The `FiniteFirehoseFactory` and `InputRowParser` classes were > deprecated > > in > > > 0.17.0 (https://github.com/apache/druid/pull/8823) in favour of > > > `InputSource`. 0.17.0 was released more than 2 years ago in Jan 2020. > > > > > > I think it is about time that we remove this code entirely. Removing > > > `InputRowParser` may not be as trivial as > `EventReceiverFirehoseFactory` > > > depends on it. I didn't find any alternatives for > > > `EventReceiverFirehoseFactory` and it is not marked deprecated as well. > > > > > > But we can still remove `FiniteFirehoseFactory` and the implementations > > > safely as there are alternatives available. > > > > > > Thoughts/Suggestions? > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org > > For additional commands, e-mail: dev-h...@druid.apache.org > > > > >