Re: DISCUSS Pinot Graduation

2021-05-14 Thread Felix Cheung
Generally it looks good, I’ve checked clutch report, website checks etc,
but a few reminders and areas to pay attention to:

- dev@ traffic is very or almost zero? I realize the community is active on
slack and summary is sent to dev@ everyday, but some traffic there will be
good...

- ... because people will miss stuff. The podling report reminder was sent
there and Pinot just missed this month’s podling report. Let’s make sure
the month is out promptly next month

- pls make sure the incubation status is updated
http://incubator.apache.org/projects/pinot - for instance either the
committer list is not sorted or the last date is wrong (should not be
2020). The rest of the page can use updating too. Also many sections there
have placeholder content, pls fill them in.

- also as suggested, please take the maturity model, fill it in and share
with dev@ anything identified - project maturity model (as a guide)
https://community.apache.org/apache-way/apache-project-maturity-model.html



On Fri, May 14, 2021 at 3:13 PM Mayank Shrivastava 
wrote:

> Mentors - Felix, Olivier, Jim,
> Wondering what your thoughts on are for proposing Pinot's graduation. We
> have addressed all the issues that have been brought up in the past. If
> there are other steps to be taken, please let us know and we can also take
> care of those as well. Looking forward to your suggestions and support.
>
> Regards,
> Mayank
>
> On Mon, May 10, 2021 at 12:54 PM Fu Xiang  wrote:
>
>> +1! Glad to see we've accomplished a lot and the community is pretty
>> strong and healthy!
>>
>> On Mon, May 10, 2021 at 11:23 AM Subbu Subramaniam 
>> wrote:
>>
>>> +1
>>>
>>> Let us know how we can help with the graduation, and if there are any
>>> pending items to be resolved.
>>>
>>> -Subbu
>>>
>>> On 2021/05/09 14:07:45, kishore g  wrote:
>>> > Hello,
>>> >
>>> >
>>> > I would like to start a conversation about the readiness of Apache
>>> Pinot to
>>> > graduate. We have come a long way since we incubated in Apache, with:
>>> >
>>> >
>>> >-
>>> >
>>> >7800+ contributions from 168 contributors
>>> >-
>>> >
>>> >7 releases by various committers
>>> >-
>>> >
>>> >6 new committers invited (all accepted)
>>> >-
>>> >
>>> >Apache website available at: https://pinot.apache.org
>>> >-
>>> >
>>> >Updated Apache Pinot (incubating) page
>>> >
>>> >-
>>> >
>>> >Updated Roster Page 
>>> >-
>>> >
>>> >Dev conversations at d...@pinot.incubator.org
>>> >-
>>> >
>>> >Diverse committers and PPMCs (from 7 companies / institutes)
>>> >-
>>> >
>>> >We have built a meritocratic and open collaborative progress (the
>>> Apache
>>> >way)
>>> >-
>>> >
>>> >A strong community of 1200+ members in Apache Pinot Slack
>>> >. All
>>> >conversations emailed to dev@ in the form of a digest
>>> >
>>> >
>>> > Please let us know if there are remaining steps involved in completing
>>> the
>>> > graduation process.
>>> >
>>> > Thanks,
>>> >
>>> > Kishore G
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
>>> For additional commands, e-mail: dev-h...@pinot.apache.org
>>>
>>>
>>
>> --
>> Xiang Fu
>>
>


Apache Pinot Daily Email Digest (2021-05-14)

2021-05-14 Thread Pinot Slack Email Digest
#general@mbracke: @mbracke has joined the channel@brijdesai6: @brijdesai6 has joined the channel@laurachen: @laurachen has joined the channel@aaron: I got some data ingested and am using a star tree index and I'm running a query like `select foo, percentiletdigest(bar, 0.5) from mytable group by foo` . I've got `foo` in my `dimensionsSplitOrder` and I've got `PERCENTILE_TDIGEST__bar` as well as `AVG__bar` in my `functionColumnPairs` . My query takes about 700 ms but if I switch it to `avg(bar)` it takes 15 ms. Is it expected that the t-digest would be that much slower? Anything I can do to speed it up?  @fx19880617: @jackie.jxt does pinot support percentile tdigest in startree?  @fx19880617: in response stats, do you see same number of docs scanned for both queries?  @jackie.jxt: Yes, startree supports TDigest. See  for more details  @jackie.jxt: Is the query constantly taking 700ms?  @aaron: For avg and percentiletdigest, numDocsScanned is 969792.  @aaron: Yeah, consistently in that range. It just took 1057 ms when I ran it  @mayanks: Yeah tdigest aggregation over 1M docs might take that long   @aaron: What does `numDocsScanned` mean in the context of a star tree index?  @mayanks: Do you have query latency with just tdigest?  @aaron: What do you mean?  @mayanks: Query with percentile tdigest but without avg   @aaron: Oh sorry, that's what I meant  @mayanks: Oh ok  @mayanks: Docs scanned should mean the same  @aaron: `select foo, percentiletdigest(bar, 0.5) from mytable group by foo`  is slow, `select foo, avg(bar) from mytable group by foo`  is fast  @mayanks: Split order helps with filtering   @mayanks: @jackie.jxt does it help with group by or just filtering?  @aaron: If I have 969792 numDocsScanned and 8950109972 totalDocs, what does numDocsScanned mean? Is that the number of star tree nodes or something?  @jackie.jxt: @mayanks Most time just filtering  @jackie.jxt: @aaron Do you need 0.5 percentile or 50 percentile? The aggregation cost of `percentiletdigest` is expected to be much higher than `avg`  @aaron: Eh I don't actually care about which percentile just yet -- just the performance  @aaron: Is there anything I can do to speed it up? A lot of my users here prefer quantiles, I think performance there will really matter  @aaron: The avg performance is... awesome  @mayanks: Your query does not have filters  @mayanks: Will it be the case always?  @aaron: Could be  @aaron: Right now I only have a small subset of the data, but yeah people might be filtering by date at the very least  @aaron: Do you expect filters to help a lot?  @mayanks: It will cut down numDocsScanned right  @aaron: Right  @aaron: I'd expect people to be scanning a similar number of documents if not an order of magnitude more  @mayanks: @jackie.jxt Any ideas on using pre-aggergates within star tree here?  @mayanks: Also, @aaron In production you'll have the same cluster size as of right now? Because if you'll have more servers, you'll get better perf  @jackie.jxt: If `foo` is the first dimension in the split order, then it will always use the pre-aggregate doc  @jackie.jxt: @aaron What's the cardinality of `foo`? How many segments do you have right now?  @aaron: Foo's cardinality is about 6  @aaron: 462 segments  @aaron: 5 servers  @aaron: Foo is third in dimensionsSplitOrder, there are 7 fields total in there  @jackie.jxt: In that case, in order to further optimize the performance, you may reduce the `maxLeafRecords` threshold. While this will increase the size of the star-tree  @mayanks: Just to callout, a lot of the latency inherently comes from the TDigest library.  @mayanks: It is pretty good in providing accuracy in limited storage, but there's a latency cost.  @aaron: Is q-digest any better? My understanding was that t-digest is faster and more accurate  @aaron: Do you have any approximate guidelines around how much faster performance will be and how much more space the star tree will take up as maxLeafRecords is decreased?  @mayanks: Yes, t-digest is definitely better than others. But it may not give you 10ms latency if you are aggregating 1M records.  @aaron: How can I get to, say, 200ms?  @mayanks: Tuning star tree (Jackie?), index size, server cores/jvm/params, etc  @jackie.jxt: For star-tree, you can trade performance with extra space by reducing the `maxLeafRecords`  @jackie.jxt: Reducing that to 1 will give you fully pre-cubed data@benjamin.walker: @benjamin.walker has joined the channel@aritra55: @aritra55 has joined the channel@oneandwholly: @oneandwholly has joined the channel#random@mbracke: @mbracke has joined the channel@brijdesai6: @brijdesai6 has joined the channel@laurachen: @laurachen has joined the channel@benjamin.walker: @benjamin.walker has joined the channel@aritra55: @aritra55 has joined the channel@oneandwholly: @oneandwholly has joined the channel#troubleshooting@jmeyer: Hello ! :wave:

*I've got the following scenario :*
• Data is integrated in multiple batches per day (in an OFFLINE table) 
◦ 

Re: DISCUSS Pinot Graduation

2021-05-14 Thread Mayank Shrivastava
Mentors - Felix, Olivier, Jim,
Wondering what your thoughts on are for proposing Pinot's graduation. We
have addressed all the issues that have been brought up in the past. If
there are other steps to be taken, please let us know and we can also take
care of those as well. Looking forward to your suggestions and support.

Regards,
Mayank

On Mon, May 10, 2021 at 12:54 PM Fu Xiang  wrote:

> +1! Glad to see we've accomplished a lot and the community is pretty
> strong and healthy!
>
> On Mon, May 10, 2021 at 11:23 AM Subbu Subramaniam 
> wrote:
>
>> +1
>>
>> Let us know how we can help with the graduation, and if there are any
>> pending items to be resolved.
>>
>> -Subbu
>>
>> On 2021/05/09 14:07:45, kishore g  wrote:
>> > Hello,
>> >
>> >
>> > I would like to start a conversation about the readiness of Apache
>> Pinot to
>> > graduate. We have come a long way since we incubated in Apache, with:
>> >
>> >
>> >-
>> >
>> >7800+ contributions from 168 contributors
>> >-
>> >
>> >7 releases by various committers
>> >-
>> >
>> >6 new committers invited (all accepted)
>> >-
>> >
>> >Apache website available at: https://pinot.apache.org
>> >-
>> >
>> >Updated Apache Pinot (incubating) page
>> >
>> >-
>> >
>> >Updated Roster Page 
>> >-
>> >
>> >Dev conversations at d...@pinot.incubator.org
>> >-
>> >
>> >Diverse committers and PPMCs (from 7 companies / institutes)
>> >-
>> >
>> >We have built a meritocratic and open collaborative progress (the
>> Apache
>> >way)
>> >-
>> >
>> >A strong community of 1200+ members in Apache Pinot Slack
>> >. All
>> >conversations emailed to dev@ in the form of a digest
>> >
>> >
>> > Please let us know if there are remaining steps involved in completing
>> the
>> > graduation process.
>> >
>> > Thanks,
>> >
>> > Kishore G
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
>> For additional commands, e-mail: dev-h...@pinot.apache.org
>>
>>
>
> --
> Xiang Fu
>