Re: Slow download of segments from deep storage
I believe today, if you use the (experimental) HTTP-based load queues, they will parallelize segment downloads. Adding similar functionality for the ZK-based load queues would definitely be useful though, since at this time nobody seems to be actively driving a migration to HTTP-based load queues being enabled by default. On Wed, Jan 30, 2019 at 7:20 PM Samarth Jain wrote: > We noticed that it takes a long time for the historicals to download > segments from deep storage (in our case S3). Looking closer at the code in > ZKCoordinator, I noticed that the segment download is happening in a single > threaded fashion. This download happens in the SingleThreadedExecutor > service used by the PathChildrenCache. Looking at the commentary on > https://github.com/apache/incubator-druid/issues/4421 and > https://github.com/apache/incubator-druid/issues/3202, the executor > service > used in PathChildrenCache can only be single threaded. > > My proposal is to use a multi threaded ExecutorService that will be used to > take action on the events to perform the download. The role of single > threaded ExecutorService in PathChildrenCache will be simply to delegate > the download task to this new executor service. > > Does that sound feasible? IMO, if this happens to be functionally correct, > it should help significantly boost up the time it is taking historicals to > download all the assigned segments. > > I would be more than happy to contribute this enhancement to the community. > > Thanks, > Samarth >
Re: Slow download of segments from deep storage
I *think* the HTTP coordination already enables this On Wed, Jan 30, 2019 at 4:20 PM Samarth Jain wrote: > We noticed that it takes a long time for the historicals to download > segments from deep storage (in our case S3). Looking closer at the code in > ZKCoordinator, I noticed that the segment download is happening in a single > threaded fashion. This download happens in the SingleThreadedExecutor > service used by the PathChildrenCache. Looking at the commentary on > https://github.com/apache/incubator-druid/issues/4421 and > https://github.com/apache/incubator-druid/issues/3202, the executor > service > used in PathChildrenCache can only be single threaded. > > My proposal is to use a multi threaded ExecutorService that will be used to > take action on the events to perform the download. The role of single > threaded ExecutorService in PathChildrenCache will be simply to delegate > the download task to this new executor service. > > Does that sound feasible? IMO, if this happens to be functionally correct, > it should help significantly boost up the time it is taking historicals to > download all the assigned segments. > > I would be more than happy to contribute this enhancement to the community. > > Thanks, > Samarth >
Slow download of segments from deep storage
We noticed that it takes a long time for the historicals to download segments from deep storage (in our case S3). Looking closer at the code in ZKCoordinator, I noticed that the segment download is happening in a single threaded fashion. This download happens in the SingleThreadedExecutor service used by the PathChildrenCache. Looking at the commentary on https://github.com/apache/incubator-druid/issues/4421 and https://github.com/apache/incubator-druid/issues/3202, the executor service used in PathChildrenCache can only be single threaded. My proposal is to use a multi threaded ExecutorService that will be used to take action on the events to perform the download. The role of single threaded ExecutorService in PathChildrenCache will be simply to delegate the download task to this new executor service. Does that sound feasible? IMO, if this happens to be functionally correct, it should help significantly boost up the time it is taking historicals to download all the assigned segments. I would be more than happy to contribute this enhancement to the community. Thanks, Samarth
Re: Off list major development
I think it'd also be nice to tweak a couple parts of the KIP template (Motivation; Public Interfaces; Proposed Changes; Compatibility, Deprecation, and Migration Plan; Test Plan; Rejected Alternatives). A couple people have suggested adding a "Rationale" section, how about adding that and removing "Rejected alternatives" -- rolling them in together? And dropping "test plan", since IMO that discussion can be deferred to the PR itself, when there is code ready. Finally, adding "future work", detailing where this change might lead us. So in particular the template I am suggesting would be something like this. 1) Motivation: A description of the problem. 2) Proposed changes: Should usually be the longest section. Should include any changes that are proposed to user-facing interfaces (configuration parameters, JSON query/ingest specs, SQL language, emitted metrics, and so on). 3) Rationale: A discussion of why this particular solution is the best one. One good way to approach this is to discuss other alternative solutions that you considered and decided against. This should also include a discussion of any specific benefits or drawbacks you are aware of. 4) Operational impact: Is anything going to be deprecated or removed by this change? Is there a migration path that cluster operators need to be aware of? Will there be any effect on the ability to do a rolling upgrade, or to do a rolling _downgrade_ if an operator wants to switch back to a previous version? 5) Future work: A discussion of things that you believe are out of scope for the particular proposal but would be nice follow-ups. It helps show where a particular change could be leading us. There isn't any commitment that the proposal author will actually work on this stuff. It is okay if this section is empty. On Wed, Jan 30, 2019 at 3:14 PM Jihoon Son wrote: > Thanks Eyal and Jon for starting the discussion about making a template! > > The KIP template looks good, but I would like to add one more. > The current template is: > > - Motivation > - Public Interfaces > - Proposed Changes > - Compatibility, Deprecation, and Migration Plan > - Test Plan > - Rejected Alternatives > > It includes almost everything required for proposals, but I think it's > missing why the author chose the proposed changes. > So, I think it would be great if we can add 'Rationale' or 'Expected > benefits and drawbacks'. > People might include it by themselves in 'Motivation' or 'Proposed > Changes', but it would be good if there's an explicit section to describe > it. > > Best, > Jihoon >
Re: Off list major development
Thanks Eyal and Jon for starting the discussion about making a template! The KIP template looks good, but I would like to add one more. The current template is: - Motivation - Public Interfaces - Proposed Changes - Compatibility, Deprecation, and Migration Plan - Test Plan - Rejected Alternatives It includes almost everything required for proposals, but I think it's missing why the author chose the proposed changes. So, I think it would be great if we can add 'Rationale' or 'Expected benefits and drawbacks'. People might include it by themselves in 'Motivation' or 'Proposed Changes', but it would be good if there's an explicit section to describe it. Best, Jihoon On Wed, Jan 30, 2019 at 11:22 AM Jonathan Wei wrote: > Hi all, > > An issue has been opened by a community member suggesting that we create a > template for proposals: > https://github.com/apache/incubator-druid/issues/6949 > > Having a template sounds convenient, and based on the discussion in this > thread, I'm suggesting we adopt something based on the Kafka proposal > format. > > I'm planning on creating such a template if there are no objections or > alternative suggestions, so please take a look if you have thoughts on > this. > > Thanks, > Jon > > On Tue, Jan 15, 2019 at 12:07 PM Jihoon Son wrote: > > > Good point. > > If some authors raise PRs without noticing the need for a proposal, we > > shouldn't ask them to close their PRs only because of the absence of the > > proposal. > > > > "Design review" without a proposal for simple PRs would be good if we can > > determine well what PRs need and what don't. > > But, how do we know? Even for the same PR, someone may think it needs a > > proposal but another may not. > > > > If someone don't notice the need for a proposal and raise a PR without > it, > > I'm fine with that. > > However, we should still encourage writing a proposal before writing code > > because we can avoid unnecessary effort. > > > > I think this kind of issue usually happens for first time contributors > and > > they will be better once they get used to Druid development. > > And I believe someday even first contributors would follow this policy > once > > it gets settled down well in the community as Kafka community does. > > > > Jihoon > > > > On Tue, Jan 15, 2019 at 4:31 AM Roman Leventov > > wrote: > > > > > In such small PRs, authors likely won't be aware that they need to > > create a > > > proposal in the first place. The first reviewer just adds the "Design > > > Review" tag. It's also absolutely not about considering designs and > > gauging > > > the proposal, it's just verifying that a configuration / parameter / > HTTP > > > endpoint name is reasonable and aligned with the rest of Druid. So I > > think > > > that a separate proposal issue for such PRs is unnecessary bureaucracy. > > > > > > On Tue, 15 Jan 2019 at 07:45, Jihoon Son wrote: > > > > > > > Roman, > > > > > > > > > Jihoon in > > > > > > > > > > > > > > https://lists.apache.org/thread.html/e007fbf362c2a870a2d88d04431789289807e00fd91d087559a01d1f@%3Cdev.druid.apache.org%3E > > > > and later Gian in this thread suggested that _every_ piece of work > that > > > > should be labelled as "Design Review" according to the current rules > > > should > > > > be accompanied by an issue. I don't agree with this, there are some > PRs > > > as > > > > small as a few dozens of lines of code, that add some configuration > > > > parameter and therefore should be labelled "Design Review". I don't > > > thing a > > > > separate proposal issue is needed for them, and even for a little > > larger > > > > PRs too. > > > > > > > > What I'm concerned with is how people feel if their design is not > > > accepted > > > > even though they wrote code. Of course, as Clint said, sometimes code > > > helps > > > > better understanding of the proposal. But, I believe this is the case > > > when > > > > the proposal is quite complicated and not easy to understand without > > > code. > > > > Also the authors should be aware of that they might rewrite the > entire > > > code > > > > if the design should be changed. > > > > > > > > If writing code is simple, I don't see why the authors don't wait > until > > > the > > > > review for their proposal is finished. > > > > > > > > Jihoon > > > > > > > > On Fri, Jan 11, 2019 at 9:51 AM Fangjin Yang > wrote: > > > > > > > > > I agree with Gian, as an Apache committer, your responsibility is > for > > > the > > > > > betterment of the project. I agree it is in the best interest of > the > > > > > project to stop thinking about what orgs people belong to. We are > > all a > > > > > part of the Apache software foundation, regardless of what our > roles > > > and > > > > > titles are outside of it. > > > > > > > > > > On Fri, Jan 11, 2019 at 2:22 AM Roman Leventov < > > leventov...@gmail.com> > > > > > wrote: > > > > > > > > > > > It's not that people from one org could abuse the project and > push > > > some > > > > > > change, but that
Re: Off list major development
Hi all, An issue has been opened by a community member suggesting that we create a template for proposals: https://github.com/apache/incubator-druid/issues/6949 Having a template sounds convenient, and based on the discussion in this thread, I'm suggesting we adopt something based on the Kafka proposal format. I'm planning on creating such a template if there are no objections or alternative suggestions, so please take a look if you have thoughts on this. Thanks, Jon On Tue, Jan 15, 2019 at 12:07 PM Jihoon Son wrote: > Good point. > If some authors raise PRs without noticing the need for a proposal, we > shouldn't ask them to close their PRs only because of the absence of the > proposal. > > "Design review" without a proposal for simple PRs would be good if we can > determine well what PRs need and what don't. > But, how do we know? Even for the same PR, someone may think it needs a > proposal but another may not. > > If someone don't notice the need for a proposal and raise a PR without it, > I'm fine with that. > However, we should still encourage writing a proposal before writing code > because we can avoid unnecessary effort. > > I think this kind of issue usually happens for first time contributors and > they will be better once they get used to Druid development. > And I believe someday even first contributors would follow this policy once > it gets settled down well in the community as Kafka community does. > > Jihoon > > On Tue, Jan 15, 2019 at 4:31 AM Roman Leventov > wrote: > > > In such small PRs, authors likely won't be aware that they need to > create a > > proposal in the first place. The first reviewer just adds the "Design > > Review" tag. It's also absolutely not about considering designs and > gauging > > the proposal, it's just verifying that a configuration / parameter / HTTP > > endpoint name is reasonable and aligned with the rest of Druid. So I > think > > that a separate proposal issue for such PRs is unnecessary bureaucracy. > > > > On Tue, 15 Jan 2019 at 07:45, Jihoon Son wrote: > > > > > Roman, > > > > > > > Jihoon in > > > > > > > > > https://lists.apache.org/thread.html/e007fbf362c2a870a2d88d04431789289807e00fd91d087559a01d1f@%3Cdev.druid.apache.org%3E > > > and later Gian in this thread suggested that _every_ piece of work that > > > should be labelled as "Design Review" according to the current rules > > should > > > be accompanied by an issue. I don't agree with this, there are some PRs > > as > > > small as a few dozens of lines of code, that add some configuration > > > parameter and therefore should be labelled "Design Review". I don't > > thing a > > > separate proposal issue is needed for them, and even for a little > larger > > > PRs too. > > > > > > What I'm concerned with is how people feel if their design is not > > accepted > > > even though they wrote code. Of course, as Clint said, sometimes code > > helps > > > better understanding of the proposal. But, I believe this is the case > > when > > > the proposal is quite complicated and not easy to understand without > > code. > > > Also the authors should be aware of that they might rewrite the entire > > code > > > if the design should be changed. > > > > > > If writing code is simple, I don't see why the authors don't wait until > > the > > > review for their proposal is finished. > > > > > > Jihoon > > > > > > On Fri, Jan 11, 2019 at 9:51 AM Fangjin Yang wrote: > > > > > > > I agree with Gian, as an Apache committer, your responsibility is for > > the > > > > betterment of the project. I agree it is in the best interest of the > > > > project to stop thinking about what orgs people belong to. We are > all a > > > > part of the Apache software foundation, regardless of what our roles > > and > > > > titles are outside of it. > > > > > > > > On Fri, Jan 11, 2019 at 2:22 AM Roman Leventov < > leventov...@gmail.com> > > > > wrote: > > > > > > > > > It's not that people from one org could abuse the project and push > > some > > > > > change, but that they have similar perspective (bubble effect) and > > some > > > > > important aspects of a large feature could escape their attention. > > > > > > > > > > I suggest it to be not a rigid rule, but a recommendation for > authors > > > of > > > > > large proposals to try to attract reviewers from other orgs. > > > > > > > > > > On Fri, 11 Jan 2019 at 02:51, Julian Hyde > wrote: > > > > > > > > > > > I agree with Gian. > > > > > > > > > > > > As an Apache committer, you only have one affiliation: you are > > > working > > > > in > > > > > > the best interests of the project. > > > > > > > > > > > > Obviously, in the real world there are other pressures. But we do > > our > > > > > best > > > > > > to compensate for them. > > > > > > > > > > > > Also, as a a community we try to design our process so as to > avoid > > > > undue > > > > > > influences. For instance, when I advocate for logging cases > early, > > I > > > am > > > > > > trying to mitigate the effect
Re: Off list major development
Hi, I have created an Issue together with @jon-wei, if anyone wants to chime in: https://github.com/apache/incubator-druid/issues/6949 (Create a proposal template #6949) On Tue, Jan 15, 2019 at 12:07 PM Jihoon Son wrote: > Good point. > If some authors raise PRs without noticing the need for a proposal, we > shouldn't ask them to close their PRs only because of the absence of the > proposal. > > "Design review" without a proposal for simple PRs would be good if we can > determine well what PRs need and what don't. > But, how do we know? Even for the same PR, someone may think it needs a > proposal but another may not. > > If someone don't notice the need for a proposal and raise a PR without it, > I'm fine with that. > However, we should still encourage writing a proposal before writing code > because we can avoid unnecessary effort. > > I think this kind of issue usually happens for first time contributors and > they will be better once they get used to Druid development. > And I believe someday even first contributors would follow this policy once > it gets settled down well in the community as Kafka community does. > > Jihoon > > On Tue, Jan 15, 2019 at 4:31 AM Roman Leventov > wrote: > > > In such small PRs, authors likely won't be aware that they need to > create a > > proposal in the first place. The first reviewer just adds the "Design > > Review" tag. It's also absolutely not about considering designs and > gauging > > the proposal, it's just verifying that a configuration / parameter / HTTP > > endpoint name is reasonable and aligned with the rest of Druid. So I > think > > that a separate proposal issue for such PRs is unnecessary bureaucracy. > > > > On Tue, 15 Jan 2019 at 07:45, Jihoon Son wrote: > > > > > Roman, > > > > > > > Jihoon in > > > > > > > > > https://lists.apache.org/thread.html/e007fbf362c2a870a2d88d04431789289807e00fd91d087559a01d1f@%3Cdev.druid.apache.org%3E > > > and later Gian in this thread suggested that _every_ piece of work that > > > should be labelled as "Design Review" according to the current rules > > should > > > be accompanied by an issue. I don't agree with this, there are some PRs > > as > > > small as a few dozens of lines of code, that add some configuration > > > parameter and therefore should be labelled "Design Review". I don't > > thing a > > > separate proposal issue is needed for them, and even for a little > larger > > > PRs too. > > > > > > What I'm concerned with is how people feel if their design is not > > accepted > > > even though they wrote code. Of course, as Clint said, sometimes code > > helps > > > better understanding of the proposal. But, I believe this is the case > > when > > > the proposal is quite complicated and not easy to understand without > > code. > > > Also the authors should be aware of that they might rewrite the entire > > code > > > if the design should be changed. > > > > > > If writing code is simple, I don't see why the authors don't wait until > > the > > > review for their proposal is finished. > > > > > > Jihoon > > > > > > On Fri, Jan 11, 2019 at 9:51 AM Fangjin Yang wrote: > > > > > > > I agree with Gian, as an Apache committer, your responsibility is for > > the > > > > betterment of the project. I agree it is in the best interest of the > > > > project to stop thinking about what orgs people belong to. We are > all a > > > > part of the Apache software foundation, regardless of what our roles > > and > > > > titles are outside of it. > > > > > > > > On Fri, Jan 11, 2019 at 2:22 AM Roman Leventov < > leventov...@gmail.com> > > > > wrote: > > > > > > > > > It's not that people from one org could abuse the project and push > > some > > > > > change, but that they have similar perspective (bubble effect) and > > some > > > > > important aspects of a large feature could escape their attention. > > > > > > > > > > I suggest it to be not a rigid rule, but a recommendation for > authors > > > of > > > > > large proposals to try to attract reviewers from other orgs. > > > > > > > > > > On Fri, 11 Jan 2019 at 02:51, Julian Hyde > wrote: > > > > > > > > > > > I agree with Gian. > > > > > > > > > > > > As an Apache committer, you only have one affiliation: you are > > > working > > > > in > > > > > > the best interests of the project. > > > > > > > > > > > > Obviously, in the real world there are other pressures. But we do > > our > > > > > best > > > > > > to compensate for them. > > > > > > > > > > > > Also, as a a community we try to design our process so as to > avoid > > > > undue > > > > > > influences. For instance, when I advocate for logging cases > early, > > I > > > am > > > > > > trying to mitigate the effect of product managers and VPs of > > > > engineering, > > > > > > who like to have their say in meeting rooms rather than on public > > > > mailing > > > > > > lists. That’s just one example; if we see other influences at > play, > > > > let’s > > > > > > evolve our process to try to level the playing field.
FOSDEM 2019
Anyone planning to be at Fosdem this year? If enough of us are attending a quick impromptu Druid gathering might be fun.
Re: The etiquette of pocking people on Github and the policy when people stop responding
On Tue, 29 Jan 2019 at 01:30, Fangjin Yang wrote: > I disagree with Roman's suggestions. If a PR has enough votes, we should > trust the committers approving the PR and move forward. > There is a specific committer who merges a PR. If this happens while it's not made clear that somebody who left comments before doesn't have any more comments, the whole situation looks to me more like disregard of that person's opinion. The trust to other committers doesn't help to make the situation look much better, IMO.
Re: The etiquette of pocking people on Github and the policy when people stop responding
On Tue, 29 Jan 2019 at 00:28, Gian Merlino wrote: > It's a totally different situation if nobody else has reviewed a patch yet. > In that case a reviewer reviewing things with longer cycles isn't blocking > anything. > There is "Development Blocker" tag for such situations. What do you think if for PRs tagged "Development Blocker" the "poking period" is recommended to be 3 working days, and a week for other PRs?