Multi-stage queries

2022-02-25 Thread Gian Merlino
Hey Druids, I recently posted a proposal on GitHub about adding multi-stage distributed queries to Druid: https://github.com/apache/druid/issues/12262 I think it'll be a powerful advancement in what Druid is capable of, and I'm interested in what people think. It's also going to be a lot of work

Re: [GitHub] [druid] cryptoe commented on a diff in pull request #12339: Make AWS WebIdentityToken actually working and usable from inside EKS.

2022-04-04 Thread Gian Merlino
I thought these emails were supposed to go to comm...@druid.apache.org? I do see a bunch on that list from today, so maybe this was a weird gitbox snafu. On Sun, Apr 3, 2022 at 10:53 PM GitBox wrote: > > cryptoe commented on code in PR #12339: > URL:

Re: 0.23

2022-03-24 Thread Gian Merlino
I agree it's a good time to do a release. Most of the release-manager steps involve having commit privileges, but nevertheless, you might find it interesting to read about the process: https://github.com/apache/druid/blob/master/distribution/asf-release-process-guide.md You've actually already

Re: Apache Druid Slack

2022-01-21 Thread Gian Merlino
It sounds like a good idea to me. It's not ideal that the current Slack workspace is hard for new people to join. On Thu, Jan 20, 2022 at 10:15 AM Vadim Ogievetsky wrote: > I think that the PMC should create a new Slack channel for Apache Druid and > shift the community towards using it away

Re: CVEs in contrib extensions

2023-09-05 Thread Gian Merlino
I think it would be OK to have a policy that contrib extension dependencies are not proactively screened for CVEs. If we adopt such a policy, we do need to make it clear to people that they should do their own screening of any contrib extensions they use. However, we can't extend that policy to

Druid Summit 2023 — call for speakers!

2023-09-11 Thread Gian Merlino
Hey Druids, I am excited to write to you about this year's Druid Summit ( https://druidsummit.org/), an event being held virtually on December 5–6, 2023. The call for speakers is open here: https://docs.google.com/forms/d/e/1FAIpQLSfoBZNh_IpSCT59fsYdTSSK92hYa7Rxf_7Fu0yBRCbK8ZwJdg/viewform A

Re: New Committer : Adarsh Sanjeev

2023-08-23 Thread Gian Merlino
Congratulations!! On Mon, Aug 21, 2023 at 8:14 AM Karan Kumar wrote: > Hello everyone, > > The Project Management Committee (PMC) for Apache Druid has invited > Adarsh to become a committer and we are pleased to announce that > Adarsh has accepted. > > Adarsh has been a consistent contributor

Re: New Committer : Soumyava Das

2023-08-23 Thread Gian Merlino
Congratulations!! On Mon, Aug 21, 2023 at 9:13 AM Karan Kumar wrote: > Hello everyone, > > The Project Management Committee (PMC) for Apache Druid has invited > Soumyava to become a committer and we are pleased to announce that > Soumyava has accepted. > > Soumyava has been a consistent

Re: [DISCUSS] Druid 0.23 release

2022-05-26 Thread Gian Merlino
I'm supportive of changing the versioning to something without the leading zero in the next release where this is practical. If it's the one after 0.23.0, then I would go with 24.0. IMO, going with 1.0 would send a message that this is the first mature release. But that isn't the case: we have

Re: Next Druid release version scheme

2022-05-27 Thread Gian Merlino
Yeah, I'd say the next one after 24.0 would be 25.0. The idea is really just to remove the leading zero and thereby communicate the accurate state of the project: it has been stable and production-ready for a long time. Some people see the leading zero and interpret that as a sign of an immature

Re: Vulnerability Report [Misconfigured DMARC Record Flag]

2022-06-21 Thread Gian Merlino
Hey Zeus, You should have received a response to this report from the Apache Security Team (secur...@apache.org). In the future, please note that security reports should be sent to secur...@apache.org, not the dev list. On Tue, Jun 21, 2022 at 1:04 PM Cyber Zeus wrote: > Hi team > kindly

Re: [DISCUSS] Removing code related to `FireHose`

2022-07-06 Thread Gian Merlino
I am in favor of immediately removing FiniteFirehoseFactory and marking EventReceiverFirehoseFactory deprecated. Then, later on we can remove InputRowParser and EventReceiverFirehoseFactory. On Fri, Jun 24, 2022 at 4:41 AM Abhishek Agarwal wrote: > I didn’t include them (RealtimeIndexTask and >

Re: Next Druid release version scheme

2022-07-06 Thread Gian Merlino
releases? > > Can I do a rolling upgrade of druid to the next version? > > > > The more things that are versioned the better, but (2) and (4) have been > > the things that have been most important to me in the past. > > > > Anyone in the community have any thou

Re: Next Druid release version scheme

2022-07-06 Thread Gian Merlino
API changes, look no > further than Guava.) > > Julian > > > On Jul 6, 2022, at 1:53 AM, Gian Merlino wrote: > > My proposal for the next release is that we merely drop the leading "0." > and don't change anything else about our dev process. We'd start the next >

Re: EJB interceptor binding API is not available

2022-06-04 Thread Gian Merlino
Hi Maithri, I haven't encountered something like this before so I'm not sure what's causing it. Is it reproducible? If you could provide some steps for someone else to see the same thing you're seeing — maybe it relies on a particular Java version, or particular Druid version, or something — then

New PMC member: Abhishek Agarwal

2022-06-07 Thread Gian Merlino
Hey Druids, The Druid PMC has invited Abhishek Agarwal (asf id abhishek, github id abhishekagarwal87) to become a PMC member, and we are pleased to announce that he has accepted. Abhishek has authored dozens of commits, participated in nearly 200 code reviews, and is release manager for the

Re: Limitations of automated unused segment kill logic (Issue #10876 and PR #10877)

2022-05-05 Thread Gian Merlino
I just took a look, and it looks like a few other people did too. Sorry it took so long! I do think that "review for a review" is a good way to go, I think! Thanks for volunteering. On Mon, May 2, 2022 at 12:12 PM Lucas Capistrant wrote: > Hi all, > > I'm writing in regards to my enhancement

Re: Intermediate segment persistence

2022-09-06 Thread Gian Merlino
Hey Pramod, If it's a minor change I recommend raising a PR. Generally raising an issue first is a good idea for bigger changes, where it is helpful to have some discussion prior to the code showing up. But for smaller changes, we can go directly to the code. You can post the PR here too, or in

Druid Summit on the road

2022-09-06 Thread Gian Merlino
Hey Druids, I am excited to write to you about upcoming events in this year's edition of Druid Summit, which is being conducted as a series of more local in-person events. I hope it gives you a chance to meet people near you in the Druid community. Attendance is free of charge. I personally will

Re: [DISCUSS] Release 24.0.1

2022-10-18 Thread Gian Merlino
Thank you for volunteering! On Mon, Oct 17, 2022 at 7:00 AM Kashif Faraz wrote: > Hi Abhishek > > If you haven't started with the release process already, I would like to > volunteer to perform this release so that we can expedite it. > Please let me know if that works for you. > > Regards >

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-08 Thread Gian Merlino
It's always good to deprecate things for some time prior to removing them, so we don't need to (nor should we) remove Hadoop 2 support right now. My vote is that in this upcoming release, we should deprecate it. The main problem in my eyes is the one Abhishek brought up: the dependency management

Re: [Discuss] S3 buckets or IT tests

2023-02-22 Thread Gian Merlino
I think the ticket you're referring to is https://issues.apache.org/jira/browse/INFRA-23952. It would definitely be valuable to run S3 integration tests as part of the automated test suite in GitHub Actions. If Infra is willing to provide a bucket for this purpose then we would certainly be

Re: About maintaining the Helm's Chart of Apache Druid

2023-02-28 Thread Gian Merlino
Not as far as I do. I think we're stuck since nobody has volunteered to do one of the two necessary things: 1) shepherd this code the IP clearance process, or 2) analyze its provenance enough to determine that IP clearance isn't necessary. If anyone is willing to do one of the above it would be

CI requiring approval for external contributors

2023-03-28 Thread Gian Merlino
Recently, ASF GitHub repos had their defaults for GitHub Actions changed to "always require approval for external contributors". In Slack, Karan pointed out that Airflow has recently submitted a ticket to have that changed back: https://issues.apache.org/jira/browse/INFRA-24200. IMO, we should do

Re: Question regarding new development

2023-03-28 Thread Gian Merlino
Looks like the conversation is now in https://github.com/apache/druid/issues/13948. On Sat, Mar 18, 2023 at 8:00 AM Sergiu Ungureanu wrote: > Hi Team, > > Yesterday I raised a question in #dev channel in slack > > https://apachedruidworkspace.slack.com/archives/C030CMF6B70/p1679085073683509 > >

Re: moving druid-core, extendedset, druid-hll into druid-processing

2023-02-06 Thread Gian Merlino
I support this. I don't feel like the separation between core and processing is buying us very much. On Mon, Jan 23, 2023 at 5:12 PM Clint Wylie wrote: > Hi all, > > I want to discuss moving druid-core, extendedset, and druid-hll into > druid-processing to simplify our code structure and

Re: group-by v1

2023-07-17 Thread Gian Merlino
+1 to removing it. The only benefit I am aware of is the same one that you mentioned. But I don't think this needs to block removing the old v1 algo. On Wed, Jul 12, 2023 at 4:07 AM Clint Wylie wrote: > Is anyone opposed to removing group-by v1? I think it would allow us > to simplify quite a

Re: About maintaining the Helm's Chart of Apache Druid

2023-07-17 Thread Gian Merlino
remove the > code. > > On Wed, Mar 1, 2023 at 7:14 AM Gian Merlino wrote: > > > Not as far as I _know_, I mean. > > > > On 2023/03/01 01:43:43 Gian Merlino wrote: > > > Not as far as I do. I think we're stuck since nobody has volunteered to > > do one of the two

Re: [DISCUSS] Druid 28 dropping support for Hadoop 2

2023-07-19 Thread Gian Merlino
already, and the next release (28) is meant to not have it. Does anyone have some spare cycles to do (2)? Gian On 2023/06/28 06:42:08 Gian Merlino wrote: > I'd like to propose dropping support for Hadoop 2 in Druid 28. Not the very > next release (which I assume will be Druid 27) but the one

Re: request to join dev group

2023-07-06 Thread Gian Merlino
Hi Tanya, Welcome! You can subscribe by sending an email to dev-subscr...@druid.apache.org. Gian On 2023/07/04 06:41:02 Tanya Mary wrote: > request to join dev group > - To unsubscribe, e-mail:

Roadmap event: call for speakers

2023-05-30 Thread Gian Merlino
Hi Druids, We are looking to put on a virtual event called "Druid.NEXT" in June highlighting things that people in the community are working on. This is a call for speakers for that event! Date is TBD, but likely late June. The event will be on the shorter side, about meetup-length (an hour or

Re: Requirements for relaxing restrictions on github actions usage

2023-06-02 Thread Gian Merlino
+1, allowing CI to run without an explicit button push by committers will help encourage new contributors. The requirements seem OK. I looked through our repo and I don't see any external actions (they are all in "github" or "actions"). We do have ".github/workflows/labeler.yml" that fires on

Re: [DISCUSS] Druid 28 dropping support for Hadoop 2

2023-06-29 Thread Gian Merlino
ran Kumar > wrote: > > > In favour of dropping hadoop 2 support . Another point is the lack of > > security and vulnerability fixes in hadoop2. > > > > > > > > On Wed, Jun 28, 2023 at 12:17 PM Clint Wylie wrote: > > > > > obvious

[DISCUSS] Druid 28 dropping support for Hadoop 2

2023-06-28 Thread Gian Merlino
I'd like to propose dropping support for Hadoop 2 in Druid 28. Not the very next release (which I assume will be Druid 27) but the one after that, likely late 2023 timeframe. In 2021, we had a discussion about moving away from Hadoop 2:

Re: Error message: "Error: Resource limit exceeded

2023-05-15 Thread Gian Merlino
Hi Alaka, There's a bit of text cut off in the error message. The full one is something like: "Time ordering is not supported for a Scan query with %,d segments per time chunk and a row limit of %,d. " + "Try reducing your query limit below maxRowsQueuedForOrdering

Re: [VOTE] Release Apache Druid 29.0.0 [RC1]

2024-02-16 Thread Gian Merlino
Thanks for managing this release! My vote is -0, let me explain why. I am concerned about usability issues with the new arrayIngestMode feature. There are various issues when mixing MVD strings and string arrays in the same column: as soon as arrays show up in a column, various "classic

Re: [VOTE] Release Apache Druid 29.0.0 [RC1]

2024-02-16 Thread Gian Merlino
re people to change their arrayIngestMode. Gian On 2024/02/16 22:24:23 Gian Merlino wrote: > I just learned that arrayIngestMode is not actually new, just > https://github.com/apache/druid/pull/15588 is. However this will still make > it more likely that people accidentally break their tab

Re: [VOTE] Release Apache Druid 29.0.0 [RC1]

2024-02-16 Thread Gian Merlino
ait for 30, given the impact that can happen if people end up with mixed types without planning for it. On Fri, Feb 16, 2024 at 2:16 PM Gian Merlino wrote: > Thanks for managing this release! > > My vote is -0, let me explain why. I am concerned about usability issues > with the new a

Re: on removing 'auto' strategy from native search query

2023-11-20 Thread Gian Merlino
We don't have usage data, but my sense is that the search query is not commonly used, and among people that use the search query, it's not common to rely on "druid.query.search.searchStrategy: auto". So I think it would be ok to remove the feature and have "auto" be an alias for "useIndexes",

<    1   2   3   4   5