Re: Slow download of segments from deep storage

2019-01-30 Thread Gian Merlino
I believe today, if you use the (experimental) HTTP-based load queues, they
will parallelize segment downloads. Adding similar functionality for the
ZK-based load queues would definitely be useful though, since at this time
nobody seems to be actively driving a migration to HTTP-based load queues
being enabled by default.

On Wed, Jan 30, 2019 at 7:20 PM Samarth Jain  wrote:

> We noticed that it takes a long time for the historicals to download
> segments from deep storage (in our case S3). Looking closer at the code in
> ZKCoordinator, I noticed that the segment download is happening in a single
> threaded fashion. This download happens in the SingleThreadedExecutor
> service used by the PathChildrenCache. Looking at the commentary on
> https://github.com/apache/incubator-druid/issues/4421 and
> https://github.com/apache/incubator-druid/issues/3202, the executor
> service
> used in PathChildrenCache can only be single threaded.
>
> My proposal is to use a multi threaded ExecutorService that will be used to
> take action on the  events to perform the download. The role of single
> threaded ExecutorService in PathChildrenCache will be simply to delegate
> the download task to this new executor service.
>
> Does that sound feasible? IMO, if this happens to be functionally correct,
> it should help significantly boost up the time it is taking historicals to
> download all the assigned segments.
>
> I would be more than happy to contribute this enhancement to the community.
>
> Thanks,
> Samarth
>


Re: Slow download of segments from deep storage

2019-01-30 Thread Charles Allen
I *think* the HTTP coordination already enables this

On Wed, Jan 30, 2019 at 4:20 PM Samarth Jain  wrote:

> We noticed that it takes a long time for the historicals to download
> segments from deep storage (in our case S3). Looking closer at the code in
> ZKCoordinator, I noticed that the segment download is happening in a single
> threaded fashion. This download happens in the SingleThreadedExecutor
> service used by the PathChildrenCache. Looking at the commentary on
> https://github.com/apache/incubator-druid/issues/4421 and
> https://github.com/apache/incubator-druid/issues/3202, the executor
> service
> used in PathChildrenCache can only be single threaded.
>
> My proposal is to use a multi threaded ExecutorService that will be used to
> take action on the  events to perform the download. The role of single
> threaded ExecutorService in PathChildrenCache will be simply to delegate
> the download task to this new executor service.
>
> Does that sound feasible? IMO, if this happens to be functionally correct,
> it should help significantly boost up the time it is taking historicals to
> download all the assigned segments.
>
> I would be more than happy to contribute this enhancement to the community.
>
> Thanks,
> Samarth
>


Slow download of segments from deep storage

2019-01-30 Thread Samarth Jain
We noticed that it takes a long time for the historicals to download
segments from deep storage (in our case S3). Looking closer at the code in
ZKCoordinator, I noticed that the segment download is happening in a single
threaded fashion. This download happens in the SingleThreadedExecutor
service used by the PathChildrenCache. Looking at the commentary on
https://github.com/apache/incubator-druid/issues/4421 and
https://github.com/apache/incubator-druid/issues/3202, the executor service
used in PathChildrenCache can only be single threaded.

My proposal is to use a multi threaded ExecutorService that will be used to
take action on the  events to perform the download. The role of single
threaded ExecutorService in PathChildrenCache will be simply to delegate
the download task to this new executor service.

Does that sound feasible? IMO, if this happens to be functionally correct,
it should help significantly boost up the time it is taking historicals to
download all the assigned segments.

I would be more than happy to contribute this enhancement to the community.

Thanks,
Samarth


Re: Off list major development

2019-01-30 Thread Gian Merlino
I think it'd also be nice to tweak a couple parts of the KIP template
(Motivation; Public Interfaces; Proposed Changes; Compatibility,
Deprecation, and Migration Plan; Test Plan; Rejected Alternatives). A
couple people have suggested adding a "Rationale" section, how about adding
that and removing "Rejected alternatives" -- rolling them in together? And
dropping "test plan", since IMO that discussion can be deferred to the PR
itself, when there is code ready. Finally, adding "future work", detailing
where this change might lead us.

So in particular the template I am suggesting would be something like this.

1) Motivation: A description of the problem.
2) Proposed changes: Should usually be the longest section. Should include
any changes that are proposed to user-facing interfaces (configuration
parameters, JSON query/ingest specs, SQL language, emitted metrics, and so
on).
3) Rationale: A discussion of why this particular solution is the best one.
One good way to approach this is to discuss other alternative solutions
that you considered and decided against. This should also include a
discussion of any specific benefits or drawbacks you are aware of.
4) Operational impact: Is anything going to be deprecated or removed by
this change? Is there a migration path that cluster operators need to be
aware of? Will there be any effect on the ability to do a rolling upgrade,
or to do a rolling _downgrade_ if an operator wants to switch back to a
previous version?
5) Future work: A discussion of things that you believe are out of scope
for the particular proposal but would be nice follow-ups. It helps show
where a particular change could be leading us. There isn't any commitment
that the proposal author will actually work on this stuff. It is okay if
this section is empty.

On Wed, Jan 30, 2019 at 3:14 PM Jihoon Son  wrote:

> Thanks Eyal and Jon for starting the discussion about making a template!
>
> The KIP template looks good, but I would like to add one more.
> The current template is:
>
> - Motivation
> - Public Interfaces
> - Proposed Changes
> - Compatibility, Deprecation, and Migration Plan
> - Test Plan
> - Rejected Alternatives
>
> It includes almost everything required for proposals, but I think it's
> missing why the author chose the proposed changes.
> So, I think it would be great if we can add 'Rationale' or 'Expected
> benefits and drawbacks'.
> People might include it by themselves in 'Motivation' or 'Proposed
> Changes', but it would be good if there's an explicit section to describe
> it.
>
> Best,
> Jihoon
>


Re: Off list major development

2019-01-30 Thread Jihoon Son
Thanks Eyal and Jon for starting the discussion about making a template!

The KIP template looks good, but I would like to add one more.
The current template is:

- Motivation
- Public Interfaces
- Proposed Changes
- Compatibility, Deprecation, and Migration Plan
- Test Plan
- Rejected Alternatives

It includes almost everything required for proposals, but I think it's
missing why the author chose the proposed changes.
So, I think it would be great if we can add 'Rationale' or 'Expected
benefits and drawbacks'.
People might include it by themselves in 'Motivation' or 'Proposed
Changes', but it would be good if there's an explicit section to describe
it.

Best,
Jihoon

On Wed, Jan 30, 2019 at 11:22 AM Jonathan Wei  wrote:

> Hi all,
>
> An issue has been opened by a community member suggesting that we create a
> template for proposals:
> https://github.com/apache/incubator-druid/issues/6949
>
> Having a template sounds convenient, and based on the discussion in this
> thread, I'm suggesting we adopt something based on the Kafka proposal
> format.
>
> I'm planning on creating such a template if there are no objections or
> alternative suggestions, so please take a look if you have thoughts on
> this.
>
> Thanks,
> Jon
>
> On Tue, Jan 15, 2019 at 12:07 PM Jihoon Son  wrote:
>
> > Good point.
> > If some authors raise PRs without noticing the need for a proposal, we
> > shouldn't ask them to close their PRs only because of the absence of the
> > proposal.
> >
> > "Design review" without a proposal for simple PRs would be good if we can
> > determine well what PRs need and what don't.
> > But, how do we know? Even for the same PR, someone may think it needs a
> > proposal but another may not.
> >
> > If someone don't notice the need for a proposal and raise a PR without
> it,
> > I'm fine with that.
> > However, we should still encourage writing a proposal before writing code
> > because we can avoid unnecessary effort.
> >
> > I think this kind of issue usually happens for first time contributors
> and
> > they will be better once they get used to Druid development.
> > And I believe someday even first contributors would follow this policy
> once
> > it gets settled down well in the community as Kafka community does.
> >
> > Jihoon
> >
> > On Tue, Jan 15, 2019 at 4:31 AM Roman Leventov 
> > wrote:
> >
> > > In such small PRs, authors likely won't be aware that they need to
> > create a
> > > proposal in the first place. The first reviewer just adds the "Design
> > > Review" tag. It's also absolutely not about considering designs and
> > gauging
> > > the proposal, it's just verifying that a configuration / parameter /
> HTTP
> > > endpoint name is reasonable and aligned with the rest of Druid. So I
> > think
> > > that a separate proposal issue for such PRs is unnecessary bureaucracy.
> > >
> > > On Tue, 15 Jan 2019 at 07:45, Jihoon Son  wrote:
> > >
> > > > Roman,
> > > >
> > > > > Jihoon in
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/e007fbf362c2a870a2d88d04431789289807e00fd91d087559a01d1f@%3Cdev.druid.apache.org%3E
> > > > and later Gian in this thread suggested that _every_ piece of work
> that
> > > > should be labelled as "Design Review" according to the current rules
> > > should
> > > > be accompanied by an issue. I don't agree with this, there are some
> PRs
> > > as
> > > > small as a few dozens of lines of code, that add some configuration
> > > > parameter and therefore should be labelled "Design Review". I don't
> > > thing a
> > > > separate proposal issue is needed for them, and even for a little
> > larger
> > > > PRs too.
> > > >
> > > > What I'm concerned with is how people feel if their design is not
> > > accepted
> > > > even though they wrote code. Of course, as Clint said, sometimes code
> > > helps
> > > > better understanding of the proposal. But, I believe this is the case
> > > when
> > > > the proposal is quite complicated and not easy to understand without
> > > code.
> > > > Also the authors should be aware of that they might rewrite the
> entire
> > > code
> > > > if the design should be changed.
> > > >
> > > > If writing code is simple, I don't see why the authors don't wait
> until
> > > the
> > > > review for their proposal is finished.
> > > >
> > > > Jihoon
> > > >
> > > > On Fri, Jan 11, 2019 at 9:51 AM Fangjin Yang 
> wrote:
> > > >
> > > > > I agree with Gian, as an Apache committer, your responsibility is
> for
> > > the
> > > > > betterment of the project. I agree it is in the best interest of
> the
> > > > > project to stop thinking about what orgs people belong to. We are
> > all a
> > > > > part of the Apache software foundation, regardless of what our
> roles
> > > and
> > > > > titles are outside of it.
> > > > >
> > > > > On Fri, Jan 11, 2019 at 2:22 AM Roman Leventov <
> > leventov...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > It's not that people from one org could abuse the project and
> push
> > > some
> > > > > > change, but that 

Re: Off list major development

2019-01-30 Thread Jonathan Wei
Hi all,

An issue has been opened by a community member suggesting that we create a
template for proposals:
https://github.com/apache/incubator-druid/issues/6949

Having a template sounds convenient, and based on the discussion in this
thread, I'm suggesting we adopt something based on the Kafka proposal
format.

I'm planning on creating such a template if there are no objections or
alternative suggestions, so please take a look if you have thoughts on this.

Thanks,
Jon

On Tue, Jan 15, 2019 at 12:07 PM Jihoon Son  wrote:

> Good point.
> If some authors raise PRs without noticing the need for a proposal, we
> shouldn't ask them to close their PRs only because of the absence of the
> proposal.
>
> "Design review" without a proposal for simple PRs would be good if we can
> determine well what PRs need and what don't.
> But, how do we know? Even for the same PR, someone may think it needs a
> proposal but another may not.
>
> If someone don't notice the need for a proposal and raise a PR without it,
> I'm fine with that.
> However, we should still encourage writing a proposal before writing code
> because we can avoid unnecessary effort.
>
> I think this kind of issue usually happens for first time contributors and
> they will be better once they get used to Druid development.
> And I believe someday even first contributors would follow this policy once
> it gets settled down well in the community as Kafka community does.
>
> Jihoon
>
> On Tue, Jan 15, 2019 at 4:31 AM Roman Leventov 
> wrote:
>
> > In such small PRs, authors likely won't be aware that they need to
> create a
> > proposal in the first place. The first reviewer just adds the "Design
> > Review" tag. It's also absolutely not about considering designs and
> gauging
> > the proposal, it's just verifying that a configuration / parameter / HTTP
> > endpoint name is reasonable and aligned with the rest of Druid. So I
> think
> > that a separate proposal issue for such PRs is unnecessary bureaucracy.
> >
> > On Tue, 15 Jan 2019 at 07:45, Jihoon Son  wrote:
> >
> > > Roman,
> > >
> > > > Jihoon in
> > >
> > >
> >
> https://lists.apache.org/thread.html/e007fbf362c2a870a2d88d04431789289807e00fd91d087559a01d1f@%3Cdev.druid.apache.org%3E
> > > and later Gian in this thread suggested that _every_ piece of work that
> > > should be labelled as "Design Review" according to the current rules
> > should
> > > be accompanied by an issue. I don't agree with this, there are some PRs
> > as
> > > small as a few dozens of lines of code, that add some configuration
> > > parameter and therefore should be labelled "Design Review". I don't
> > thing a
> > > separate proposal issue is needed for them, and even for a little
> larger
> > > PRs too.
> > >
> > > What I'm concerned with is how people feel if their design is not
> > accepted
> > > even though they wrote code. Of course, as Clint said, sometimes code
> > helps
> > > better understanding of the proposal. But, I believe this is the case
> > when
> > > the proposal is quite complicated and not easy to understand without
> > code.
> > > Also the authors should be aware of that they might rewrite the entire
> > code
> > > if the design should be changed.
> > >
> > > If writing code is simple, I don't see why the authors don't wait until
> > the
> > > review for their proposal is finished.
> > >
> > > Jihoon
> > >
> > > On Fri, Jan 11, 2019 at 9:51 AM Fangjin Yang  wrote:
> > >
> > > > I agree with Gian, as an Apache committer, your responsibility is for
> > the
> > > > betterment of the project. I agree it is in the best interest of the
> > > > project to stop thinking about what orgs people belong to. We are
> all a
> > > > part of the Apache software foundation, regardless of what our roles
> > and
> > > > titles are outside of it.
> > > >
> > > > On Fri, Jan 11, 2019 at 2:22 AM Roman Leventov <
> leventov...@gmail.com>
> > > > wrote:
> > > >
> > > > > It's not that people from one org could abuse the project and push
> > some
> > > > > change, but that they have similar perspective (bubble effect) and
> > some
> > > > > important aspects of a large feature could escape their attention.
> > > > >
> > > > > I suggest it to be not a rigid rule, but a recommendation for
> authors
> > > of
> > > > > large proposals to try to attract reviewers from other orgs.
> > > > >
> > > > > On Fri, 11 Jan 2019 at 02:51, Julian Hyde 
> wrote:
> > > > >
> > > > > > I agree with Gian.
> > > > > >
> > > > > > As an Apache committer, you only have one affiliation: you are
> > > working
> > > > in
> > > > > > the best interests of the project.
> > > > > >
> > > > > > Obviously, in the real world there are other pressures. But we do
> > our
> > > > > best
> > > > > > to compensate for them.
> > > > > >
> > > > > > Also, as a a community we try to design our process so as to
> avoid
> > > > undue
> > > > > > influences. For instance, when I advocate for logging cases
> early,
> > I
> > > am
> > > > > > trying to mitigate the effect 

Re: Off list major development

2019-01-30 Thread Eyal Yurman
Hi, I have created an Issue together with @jon-wei, if anyone wants to
chime in:
https://github.com/apache/incubator-druid/issues/6949 (Create a proposal
template #6949)

On Tue, Jan 15, 2019 at 12:07 PM Jihoon Son  wrote:

> Good point.
> If some authors raise PRs without noticing the need for a proposal, we
> shouldn't ask them to close their PRs only because of the absence of the
> proposal.
>
> "Design review" without a proposal for simple PRs would be good if we can
> determine well what PRs need and what don't.
> But, how do we know? Even for the same PR, someone may think it needs a
> proposal but another may not.
>
> If someone don't notice the need for a proposal and raise a PR without it,
> I'm fine with that.
> However, we should still encourage writing a proposal before writing code
> because we can avoid unnecessary effort.
>
> I think this kind of issue usually happens for first time contributors and
> they will be better once they get used to Druid development.
> And I believe someday even first contributors would follow this policy once
> it gets settled down well in the community as Kafka community does.
>
> Jihoon
>
> On Tue, Jan 15, 2019 at 4:31 AM Roman Leventov 
> wrote:
>
> > In such small PRs, authors likely won't be aware that they need to
> create a
> > proposal in the first place. The first reviewer just adds the "Design
> > Review" tag. It's also absolutely not about considering designs and
> gauging
> > the proposal, it's just verifying that a configuration / parameter / HTTP
> > endpoint name is reasonable and aligned with the rest of Druid. So I
> think
> > that a separate proposal issue for such PRs is unnecessary bureaucracy.
> >
> > On Tue, 15 Jan 2019 at 07:45, Jihoon Son  wrote:
> >
> > > Roman,
> > >
> > > > Jihoon in
> > >
> > >
> >
> https://lists.apache.org/thread.html/e007fbf362c2a870a2d88d04431789289807e00fd91d087559a01d1f@%3Cdev.druid.apache.org%3E
> > > and later Gian in this thread suggested that _every_ piece of work that
> > > should be labelled as "Design Review" according to the current rules
> > should
> > > be accompanied by an issue. I don't agree with this, there are some PRs
> > as
> > > small as a few dozens of lines of code, that add some configuration
> > > parameter and therefore should be labelled "Design Review". I don't
> > thing a
> > > separate proposal issue is needed for them, and even for a little
> larger
> > > PRs too.
> > >
> > > What I'm concerned with is how people feel if their design is not
> > accepted
> > > even though they wrote code. Of course, as Clint said, sometimes code
> > helps
> > > better understanding of the proposal. But, I believe this is the case
> > when
> > > the proposal is quite complicated and not easy to understand without
> > code.
> > > Also the authors should be aware of that they might rewrite the entire
> > code
> > > if the design should be changed.
> > >
> > > If writing code is simple, I don't see why the authors don't wait until
> > the
> > > review for their proposal is finished.
> > >
> > > Jihoon
> > >
> > > On Fri, Jan 11, 2019 at 9:51 AM Fangjin Yang  wrote:
> > >
> > > > I agree with Gian, as an Apache committer, your responsibility is for
> > the
> > > > betterment of the project. I agree it is in the best interest of the
> > > > project to stop thinking about what orgs people belong to. We are
> all a
> > > > part of the Apache software foundation, regardless of what our roles
> > and
> > > > titles are outside of it.
> > > >
> > > > On Fri, Jan 11, 2019 at 2:22 AM Roman Leventov <
> leventov...@gmail.com>
> > > > wrote:
> > > >
> > > > > It's not that people from one org could abuse the project and push
> > some
> > > > > change, but that they have similar perspective (bubble effect) and
> > some
> > > > > important aspects of a large feature could escape their attention.
> > > > >
> > > > > I suggest it to be not a rigid rule, but a recommendation for
> authors
> > > of
> > > > > large proposals to try to attract reviewers from other orgs.
> > > > >
> > > > > On Fri, 11 Jan 2019 at 02:51, Julian Hyde 
> wrote:
> > > > >
> > > > > > I agree with Gian.
> > > > > >
> > > > > > As an Apache committer, you only have one affiliation: you are
> > > working
> > > > in
> > > > > > the best interests of the project.
> > > > > >
> > > > > > Obviously, in the real world there are other pressures. But we do
> > our
> > > > > best
> > > > > > to compensate for them.
> > > > > >
> > > > > > Also, as a a community we try to design our process so as to
> avoid
> > > > undue
> > > > > > influences. For instance, when I advocate for logging cases
> early,
> > I
> > > am
> > > > > > trying to mitigate the effect of product managers and VPs of
> > > > engineering,
> > > > > > who like to have their say in meeting rooms rather than on public
> > > > mailing
> > > > > > lists. That’s just one example; if we see other influences at
> play,
> > > > let’s
> > > > > > evolve our process to try to level the playing field.

FOSDEM 2019

2019-01-30 Thread Dylan Wylie
Anyone planning to be at Fosdem this year? If enough of us are attending a
quick impromptu Druid gathering might be fun.


Re: The etiquette of pocking people on Github and the policy when people stop responding

2019-01-30 Thread Roman Leventov
On Tue, 29 Jan 2019 at 01:30, Fangjin Yang  wrote:

> I disagree with Roman's suggestions. If a PR has enough votes, we should
> trust the committers approving the PR and move forward.
>

There is a specific committer who merges a PR. If this happens while it's
not made clear that somebody who left comments before doesn't have any more
comments, the whole situation looks to me more like disregard of that
person's opinion. The trust to other committers doesn't help to make the
situation look much better, IMO.


Re: The etiquette of pocking people on Github and the policy when people stop responding

2019-01-30 Thread Roman Leventov
On Tue, 29 Jan 2019 at 00:28, Gian Merlino  wrote:

> It's a totally different situation if nobody else has reviewed a patch yet.
> In that case a reviewer reviewing things with longer cycles isn't blocking
> anything.
>

There is "Development Blocker" tag for such situations.  What do you think
if for PRs tagged "Development Blocker" the "poking period" is recommended
to be 3 working days, and a week for other PRs?