Re: Next Druid release version scheme

2022-07-06 Thread Gian Merlino
I'd say yes, in a way that's similar to today. Today we treat increments of
the version after the 0 as potentially allowing breaking changes. We also
try to avoid them whenever feasible, because we know they're painful for
users. I'm not suggesting we immediately get any more, or less, eager about
making breaking changes as part of dropping the "0.". Over time, though,
I'd like to see us get less eager about making breaking changes.

On Wed, Jul 6, 2022 at 9:47 AM Julian Hyde  wrote:

> Would 24.0 and 25.0 each be regarded as major versions for the purposes of
> semantic versioning?
>
> If so, under the rules of semantic versioning, we *can* make breaking API
> changes but that doesn’t mean that we *should*. (For an example of a
> project that followed the letter of semantic versioning but still
> undermined the trust of their users by making too many API changes, look no
> further than Guava.)
>
> Julian
>
>
> On Jul 6, 2022, at 1:53 AM, Gian Merlino  wrote:
>
> My proposal for the next release is that we merely drop the leading "0."
> and don't change anything else about our dev process. We'd start the next
> release at 24.0, and then likely do 25.0 shortly after. Same as today, just
> no leading '0.".
>
> Separately, I'd like to craft a better versioning story around extension
> API, query API, etc. But I don't think we need to connect these two things.
> The dropping of the leading "0." is mainly about reflecting the reality
> that the project is way more stable than a random member of the public
> would expect for a "0." release. The better versioning story is an effort
> that is independent from that.
>
> On Tue, Jun 7, 2022 at 11:50 AM Xavier Léauté  >
> wrote:
>
> Extension API: do extensions written for version X run as expected with
>
> version Y?
>
> One thing I'd like to see us do before we declare to 1.0 and provide
> backwards compatibility for extensions APIs is
> to remove some of the crufty Hadoop 2.x and Guava 16 dependency constraints
> we have (or at least isolate them so
> extensions and core are not constrained by old versions). Removing those
> will likely be a breaking change for extensions.
>
> I'm also fine declaring 1.0, but that might mean we can't deprecate things
> until 2.0, and then remove those in 3.0 depending on
> what our backwards compatibility guarantees are. What I'd like us to avoid
> is to be further entrenched and bogged down in
> moving away from those dependencies by declaring a stable API.
>
> Xavier
>
> On Mon, Jun 6, 2022 at 2:45 PM rahul gidwani 
> wrote:
>
> Hi Gian, this is great.
>
> For me what is most important is (2) and (4)
> Does my current extension work with new releases?
> Can I do a rolling upgrade of druid to the next version?
>
> The more things that are versioned the better, but (2) and (4) have been
> the things that have been most important to me in the past.
>
> Anyone in the community have any thoughts on this?
> Thank you
> rahul
>
>
>
> On Fri, May 27, 2022 at 11:22 AM Gian Merlino  wrote:
>
> Yeah, I'd say the next one after 24.0 would be 25.0. The idea is really
> just to remove the leading zero and thereby communicate the accurate
>
> state
>
> of the project: it has been stable and production-ready for a long
>
> time.
>
> Some people see the leading zero and interpret that as a sign of an
> immature or non-production-ready system. So I think this change is
>
> worth
>
> doing and beneficial.
>
> I do think we can do better at communicating compatibility, but IMO
> semantic versioning for the whole system isn't the best way to do it.
> Semantic versioning is good for libraries, where people need one kind
>
> of
>
> assurance: that they can update to the latest version of the library
> without needing to make changes in their program. But Druid is
> infrastructure software with many varied senses of compatibility, such
>
> as:
>
>
> 1) Query API: do user queries written for version X return compatible
> responses when run against version Y?
> 2) Extension API: do extensions written for version X run as expected
>
> with
>
> version Y?
> 3) Storage format: can servers at version X read segments written by
> servers at version Y?
> 4) Intracluster protocol: can a server at version X communicate
>
> properly
>
> with a server at version Y?
> 5) Server configuration: do server configurations (runtime properties,
>
> jvm
>
> configs) written for version X work as expected for version Y?
> 6) Ecosystem: does version Y drop support for older versions of
>
> ZooKeeper,
>
> Kafka, Hadoop, etc, which were supported by version X?
>
> In practice we do find good reasons to make such changes in one or more
>
> of
>
> these areas in many of our releases. We try to maximize compatibility
> between releases, but it is balanced against the effort to improve the
> system while keeping the code maintainable. So if we considered all of
> these areas in semantic versioning, we'd be incrementing the major
>
> version
>
> often anyway. The effect would be 

Re: Next Druid release version scheme

2022-07-06 Thread rahul gidwani
+1 to what Julian said.


On Wed, Jul 6, 2022 at 9:47 AM Julian Hyde  wrote:

> Would 24.0 and 25.0 each be regarded as major versions for the purposes of
> semantic versioning?
>
> If so, under the rules of semantic versioning, we *can* make breaking API
> changes but that doesn’t mean that we *should*. (For an example of a
> project that followed the letter of semantic versioning but still
> undermined the trust of their users by making too many API changes, look no
> further than Guava.)
>
> Julian
>
>
> On Jul 6, 2022, at 1:53 AM, Gian Merlino  wrote:
>
> My proposal for the next release is that we merely drop the leading "0."
> and don't change anything else about our dev process. We'd start the next
> release at 24.0, and then likely do 25.0 shortly after. Same as today, just
> no leading '0.".
>
> Separately, I'd like to craft a better versioning story around extension
> API, query API, etc. But I don't think we need to connect these two things.
> The dropping of the leading "0." is mainly about reflecting the reality
> that the project is way more stable than a random member of the public
> would expect for a "0." release. The better versioning story is an effort
> that is independent from that.
>
> On Tue, Jun 7, 2022 at 11:50 AM Xavier Léauté  >
> wrote:
>
> Extension API: do extensions written for version X run as expected with
>
> version Y?
>
> One thing I'd like to see us do before we declare to 1.0 and provide
> backwards compatibility for extensions APIs is
> to remove some of the crufty Hadoop 2.x and Guava 16 dependency constraints
> we have (or at least isolate them so
> extensions and core are not constrained by old versions). Removing those
> will likely be a breaking change for extensions.
>
> I'm also fine declaring 1.0, but that might mean we can't deprecate things
> until 2.0, and then remove those in 3.0 depending on
> what our backwards compatibility guarantees are. What I'd like us to avoid
> is to be further entrenched and bogged down in
> moving away from those dependencies by declaring a stable API.
>
> Xavier
>
> On Mon, Jun 6, 2022 at 2:45 PM rahul gidwani 
> wrote:
>
> Hi Gian, this is great.
>
> For me what is most important is (2) and (4)
> Does my current extension work with new releases?
> Can I do a rolling upgrade of druid to the next version?
>
> The more things that are versioned the better, but (2) and (4) have been
> the things that have been most important to me in the past.
>
> Anyone in the community have any thoughts on this?
> Thank you
> rahul
>
>
>
> On Fri, May 27, 2022 at 11:22 AM Gian Merlino  wrote:
>
> Yeah, I'd say the next one after 24.0 would be 25.0. The idea is really
> just to remove the leading zero and thereby communicate the accurate
>
> state
>
> of the project: it has been stable and production-ready for a long
>
> time.
>
> Some people see the leading zero and interpret that as a sign of an
> immature or non-production-ready system. So I think this change is
>
> worth
>
> doing and beneficial.
>
> I do think we can do better at communicating compatibility, but IMO
> semantic versioning for the whole system isn't the best way to do it.
> Semantic versioning is good for libraries, where people need one kind
>
> of
>
> assurance: that they can update to the latest version of the library
> without needing to make changes in their program. But Druid is
> infrastructure software with many varied senses of compatibility, such
>
> as:
>
>
> 1) Query API: do user queries written for version X return compatible
> responses when run against version Y?
> 2) Extension API: do extensions written for version X run as expected
>
> with
>
> version Y?
> 3) Storage format: can servers at version X read segments written by
> servers at version Y?
> 4) Intracluster protocol: can a server at version X communicate
>
> properly
>
> with a server at version Y?
> 5) Server configuration: do server configurations (runtime properties,
>
> jvm
>
> configs) written for version X work as expected for version Y?
> 6) Ecosystem: does version Y drop support for older versions of
>
> ZooKeeper,
>
> Kafka, Hadoop, etc, which were supported by version X?
>
> In practice we do find good reasons to make such changes in one or more
>
> of
>
> these areas in many of our releases. We try to maximize compatibility
> between releases, but it is balanced against the effort to improve the
> system while keeping the code maintainable. So if we considered all of
> these areas in semantic versioning, we'd be incrementing the major
>
> version
>
> often anyway. The effect would be similar to having a "meaningless"
>
> version
>
> number but with more steps.
>
> IMO a better approach would be to introduce more kinds of version
>
> numbers.
>
> In my experience the two most important kinds of compatibility to most
> users are "Query API" and "Extension API". So if we had a "Query API
> version" or "Extension API version" then we could semantically version
>
> the
>
> Query and Extension 

Re: Next Druid release version scheme

2022-07-06 Thread Julian Hyde
Would 24.0 and 25.0 each be regarded as major versions for the purposes of
semantic versioning?

If so, under the rules of semantic versioning, we *can* make breaking API
changes but that doesn’t mean that we *should*. (For an example of a
project that followed the letter of semantic versioning but still
undermined the trust of their users by making too many API changes, look no
further than Guava.)

Julian


On Jul 6, 2022, at 1:53 AM, Gian Merlino  wrote:

My proposal for the next release is that we merely drop the leading "0."
and don't change anything else about our dev process. We'd start the next
release at 24.0, and then likely do 25.0 shortly after. Same as today, just
no leading '0.".

Separately, I'd like to craft a better versioning story around extension
API, query API, etc. But I don't think we need to connect these two things.
The dropping of the leading "0." is mainly about reflecting the reality
that the project is way more stable than a random member of the public
would expect for a "0." release. The better versioning story is an effort
that is independent from that.

On Tue, Jun 7, 2022 at 11:50 AM Xavier Léauté 
wrote:

Extension API: do extensions written for version X run as expected with

version Y?

One thing I'd like to see us do before we declare to 1.0 and provide
backwards compatibility for extensions APIs is
to remove some of the crufty Hadoop 2.x and Guava 16 dependency constraints
we have (or at least isolate them so
extensions and core are not constrained by old versions). Removing those
will likely be a breaking change for extensions.

I'm also fine declaring 1.0, but that might mean we can't deprecate things
until 2.0, and then remove those in 3.0 depending on
what our backwards compatibility guarantees are. What I'd like us to avoid
is to be further entrenched and bogged down in
moving away from those dependencies by declaring a stable API.

Xavier

On Mon, Jun 6, 2022 at 2:45 PM rahul gidwani 
wrote:

Hi Gian, this is great.

For me what is most important is (2) and (4)
Does my current extension work with new releases?
Can I do a rolling upgrade of druid to the next version?

The more things that are versioned the better, but (2) and (4) have been
the things that have been most important to me in the past.

Anyone in the community have any thoughts on this?
Thank you
rahul



On Fri, May 27, 2022 at 11:22 AM Gian Merlino  wrote:

Yeah, I'd say the next one after 24.0 would be 25.0. The idea is really
just to remove the leading zero and thereby communicate the accurate

state

of the project: it has been stable and production-ready for a long

time.

Some people see the leading zero and interpret that as a sign of an
immature or non-production-ready system. So I think this change is

worth

doing and beneficial.

I do think we can do better at communicating compatibility, but IMO
semantic versioning for the whole system isn't the best way to do it.
Semantic versioning is good for libraries, where people need one kind

of

assurance: that they can update to the latest version of the library
without needing to make changes in their program. But Druid is
infrastructure software with many varied senses of compatibility, such

as:


1) Query API: do user queries written for version X return compatible
responses when run against version Y?
2) Extension API: do extensions written for version X run as expected

with

version Y?
3) Storage format: can servers at version X read segments written by
servers at version Y?
4) Intracluster protocol: can a server at version X communicate

properly

with a server at version Y?
5) Server configuration: do server configurations (runtime properties,

jvm

configs) written for version X work as expected for version Y?
6) Ecosystem: does version Y drop support for older versions of

ZooKeeper,

Kafka, Hadoop, etc, which were supported by version X?

In practice we do find good reasons to make such changes in one or more

of

these areas in many of our releases. We try to maximize compatibility
between releases, but it is balanced against the effort to improve the
system while keeping the code maintainable. So if we considered all of
these areas in semantic versioning, we'd be incrementing the major

version

often anyway. The effect would be similar to having a "meaningless"

version

number but with more steps.

IMO a better approach would be to introduce more kinds of version

numbers.

In my experience the two most important kinds of compatibility to most
users are "Query API" and "Extension API". So if we had a "Query API
version" or "Extension API version" then we could semantically version

the

Query and Extension API versions, separately from the main Druid

version.

(Each Druid release would have an associated Extension API version,

and a

list of supported Query API versions that users could choose between

on a

per-query basis.)

Rahul, I wonder what you think about this idea? What kinds of

compatibility

are most important to 

Re: Next Druid release version scheme

2022-07-06 Thread Gian Merlino
My proposal for the next release is that we merely drop the leading "0."
and don't change anything else about our dev process. We'd start the next
release at 24.0, and then likely do 25.0 shortly after. Same as today, just
no leading '0.".

Separately, I'd like to craft a better versioning story around extension
API, query API, etc. But I don't think we need to connect these two things.
The dropping of the leading "0." is mainly about reflecting the reality
that the project is way more stable than a random member of the public
would expect for a "0." release. The better versioning story is an effort
that is independent from that.

On Tue, Jun 7, 2022 at 11:50 AM Xavier Léauté 
wrote:

> > Extension API: do extensions written for version X run as expected with
> version Y?
>
> One thing I'd like to see us do before we declare to 1.0 and provide
> backwards compatibility for extensions APIs is
> to remove some of the crufty Hadoop 2.x and Guava 16 dependency constraints
> we have (or at least isolate them so
> extensions and core are not constrained by old versions). Removing those
> will likely be a breaking change for extensions.
>
> I'm also fine declaring 1.0, but that might mean we can't deprecate things
> until 2.0, and then remove those in 3.0 depending on
> what our backwards compatibility guarantees are. What I'd like us to avoid
> is to be further entrenched and bogged down in
> moving away from those dependencies by declaring a stable API.
>
> Xavier
>
> On Mon, Jun 6, 2022 at 2:45 PM rahul gidwani 
> wrote:
>
> > Hi Gian, this is great.
> >
> > For me what is most important is (2) and (4)
> > Does my current extension work with new releases?
> > Can I do a rolling upgrade of druid to the next version?
> >
> > The more things that are versioned the better, but (2) and (4) have been
> > the things that have been most important to me in the past.
> >
> > Anyone in the community have any thoughts on this?
> > Thank you
> > rahul
> >
> >
> >
> > On Fri, May 27, 2022 at 11:22 AM Gian Merlino  wrote:
> >
> > > Yeah, I'd say the next one after 24.0 would be 25.0. The idea is really
> > > just to remove the leading zero and thereby communicate the accurate
> > state
> > > of the project: it has been stable and production-ready for a long
> time.
> > > Some people see the leading zero and interpret that as a sign of an
> > > immature or non-production-ready system. So I think this change is
> worth
> > > doing and beneficial.
> > >
> > > I do think we can do better at communicating compatibility, but IMO
> > > semantic versioning for the whole system isn't the best way to do it.
> > > Semantic versioning is good for libraries, where people need one kind
> of
> > > assurance: that they can update to the latest version of the library
> > > without needing to make changes in their program. But Druid is
> > > infrastructure software with many varied senses of compatibility, such
> > as:
> > >
> > > 1) Query API: do user queries written for version X return compatible
> > > responses when run against version Y?
> > > 2) Extension API: do extensions written for version X run as expected
> > with
> > > version Y?
> > > 3) Storage format: can servers at version X read segments written by
> > > servers at version Y?
> > > 4) Intracluster protocol: can a server at version X communicate
> properly
> > > with a server at version Y?
> > > 5) Server configuration: do server configurations (runtime properties,
> > jvm
> > > configs) written for version X work as expected for version Y?
> > > 6) Ecosystem: does version Y drop support for older versions of
> > ZooKeeper,
> > > Kafka, Hadoop, etc, which were supported by version X?
> > >
> > > In practice we do find good reasons to make such changes in one or more
> > of
> > > these areas in many of our releases. We try to maximize compatibility
> > > between releases, but it is balanced against the effort to improve the
> > > system while keeping the code maintainable. So if we considered all of
> > > these areas in semantic versioning, we'd be incrementing the major
> > version
> > > often anyway. The effect would be similar to having a "meaningless"
> > version
> > > number but with more steps.
> > >
> > > IMO a better approach would be to introduce more kinds of version
> > numbers.
> > > In my experience the two most important kinds of compatibility to most
> > > users are "Query API" and "Extension API". So if we had a "Query API
> > > version" or "Extension API version" then we could semantically version
> > the
> > > Query and Extension API versions, separately from the main Druid
> version.
> > > (Each Druid release would have an associated Extension API version,
> and a
> > > list of supported Query API versions that users could choose between
> on a
> > > per-query basis.)
> > >
> > > Rahul, I wonder what you think about this idea? What kinds of
> > compatibility
> > > are most important to you?
> > >
> > > On Fri, May 27, 2022 at 9:39 AM rahul gidwani 
> wrote:

Re: [DISCUSS] Removing code related to `FireHose`

2022-07-06 Thread Gian Merlino
I am in favor of immediately removing FiniteFirehoseFactory and marking
EventReceiverFirehoseFactory deprecated. Then, later on we can remove
InputRowParser and EventReceiverFirehoseFactory.

On Fri, Jun 24, 2022 at 4:41 AM Abhishek Agarwal 
wrote:

> I didn’t include them (RealtimeIndexTask and
> AppenderatorDriverRealtimeIndexTask) in my previous email because they have
> not been marked deprecated yet. We should mark them deprecated officially
> in the next release and remove them in the release after that.
>
> So looks like the classes that we can definitely remove are implementations
> of `FiniteFirehoseFactory` and mark the `Firehose` interface deprecated.
>
> On Fri, 24 Jun 2022 at 4:36 AM, Clint Wylie  wrote:
>
> > If we remove RealtimeIndexTask and AppenderatorDriverRealtimeIndexTask
> > then we can remove EventReceiverFirehoseFactory. The former was
> > primarily used by tranquility which has been sunset, the latter I'm
> > not sure was ever used for anything. I'm personally in favor of
> > removing both of them since push based ingestion is very fragile in my
> > experience, but I think some of the oldest integration tests use
> > RealtimeIndexTask and so would need to be removed/updated/rewritten to
> > use something else as appropriate.
> >
> > I don't think we can completely remove InputRowParser until we drop
> > Hadoop support (or modify Hadoop ingestion to use
> > InputSource/InputFormat?), since it still relies on using the older
> > spec. As far as I know, Thrift is the only data format that has not
> > been fully migrated to use InputFormat, though there is an old PR that
> > is mostly done  here https://github.com/apache/druid/pull/11360.
> >
> > On Thu, Jun 23, 2022 at 5:11 AM Abhishek Agarwal
> >  wrote:
> > >
> > > Hello,
> > > The `FiniteFirehoseFactory` and `InputRowParser` classes were
> deprecated
> > in
> > > 0.17.0 (https://github.com/apache/druid/pull/8823) in favour of
> > > `InputSource`.  0.17.0 was released more than 2 years ago in Jan 2020.
> > >
> > > I think it is about time that we remove this code entirely. Removing
> > > `InputRowParser` may not be as trivial as
> `EventReceiverFirehoseFactory`
> > > depends on it. I didn't find any alternatives for
> > > `EventReceiverFirehoseFactory` and it is not marked deprecated as well.
> > >
> > > But we can still remove `FiniteFirehoseFactory` and the implementations
> > > safely as there are alternatives available.
> > >
> > > Thoughts/Suggestions?
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> >
> >
>