Apache Pinot Daily Email Digest (2021-05-14)

Pinot Slack Email Digest Fri, 14 May 2021 19:00:33 -0700

#general

@mbracke: @mbracke has joined the channel
@brijdesai6: @brijdesai6 has joined the channel
@laurachen: @laurachen has joined the channel
@aaron: I got some data ingested and am using a star tree index and I'm running a query like `select foo, percentiletdigest(bar, 0.5) from mytable group by foo` . I've got `foo` in my `dimensionsSplitOrder` and I've got `PERCENTILE_TDIGEST__bar` as well as `AVG__bar` in my `functionColumnPairs` . My query takes about 700 ms but if I switch it to `avg(bar)` it takes 15 ms. Is it expected that the t-digest would be that much slower? Anything I can do to speed it up?
@fx19880617: @jackie.jxt does pinot support percentile tdigest in startree?
@fx19880617: in response stats, do you see same number of docs scanned for both queries?
@jackie.jxt: Yes, startree supports TDigest. See for more details
@jackie.jxt: Is the query constantly taking 700ms?
@aaron: For avg and percentiletdigest, numDocsScanned is 969792.
@aaron: Yeah, consistently in that range. It just took 1057 ms when I ran it
@mayanks: Yeah tdigest aggregation over 1M docs might take that long
@aaron: What does `numDocsScanned` mean in the context of a star tree index?
@mayanks: Do you have query latency with just tdigest?
@aaron: What do you mean?
@mayanks: Query with percentile tdigest but without avg
@aaron: Oh sorry, that's what I meant
@mayanks: Oh ok
@mayanks: Docs scanned should mean the same
@aaron: `select foo, percentiletdigest(bar, 0.5) from mytable group by foo` is slow, `select foo, avg(bar) from mytable group by foo` is fast
@mayanks: Split order helps with filtering
@mayanks: @jackie.jxt does it help with group by or just filtering?
@aaron: If I have 969792 numDocsScanned and 8950109972 totalDocs, what does numDocsScanned mean? Is that the number of star tree nodes or something?
@jackie.jxt: @mayanks Most time just filtering
@jackie.jxt: @aaron Do you need 0.5 percentile or 50 percentile? The aggregation cost of `percentiletdigest` is expected to be much higher than `avg`
@aaron: Eh I don't actually care about which percentile just yet -- just the performance
@aaron: Is there anything I can do to speed it up? A lot of my users here prefer quantiles, I think performance there will really matter
@aaron: The avg performance is... awesome
@mayanks: Your query does not have filters
@mayanks: Will it be the case always?
@aaron: Could be
@aaron: Right now I only have a small subset of the data, but yeah people might be filtering by date at the very least
@aaron: Do you expect filters to help a lot?
@mayanks: It will cut down numDocsScanned right
@aaron: Right
@aaron: I'd expect people to be scanning a similar number of documents if not an order of magnitude more
@mayanks: @jackie.jxt Any ideas on using pre-aggergates within star tree here?
@mayanks: Also, @aaron In production you'll have the same cluster size as of right now? Because if you'll have more servers, you'll get better perf
@jackie.jxt: If `foo` is the first dimension in the split order, then it will always use the pre-aggregate doc
@jackie.jxt: @aaron What's the cardinality of `foo`? How many segments do you have right now?
@aaron: Foo's cardinality is about 6
@aaron: 462 segments
@aaron: 5 servers
@aaron: Foo is third in dimensionsSplitOrder, there are 7 fields total in there
@jackie.jxt: In that case, in order to further optimize the performance, you may reduce the `maxLeafRecords` threshold. While this will increase the size of the star-tree
@mayanks: Just to callout, a lot of the latency inherently comes from the TDigest library.
@mayanks: It is pretty good in providing accuracy in limited storage, but there's a latency cost.
@aaron: Is q-digest any better? My understanding was that t-digest is faster and more accurate
@aaron: Do you have any approximate guidelines around how much faster performance will be and how much more space the star tree will take up as maxLeafRecords is decreased?
@mayanks: Yes, t-digest is definitely better than others. But it may not give you 10ms latency if you are aggregating 1M records.
@aaron: How can I get to, say, 200ms?
@mayanks: Tuning star tree (Jackie?), index size, server cores/jvm/params, etc
@jackie.jxt: For star-tree, you can trade performance with extra space by reducing the `maxLeafRecords`
@jackie.jxt: Reducing that to 1 will give you fully pre-cubed data
@benjamin.walker: @benjamin.walker has joined the channel
@aritra55: @aritra55 has joined the channel
@oneandwholly: @oneandwholly has joined the channel

#random

@mbracke: @mbracke has joined the channel
@brijdesai6: @brijdesai6 has joined the channel
@laurachen: @laurachen has joined the channel
@benjamin.walker: @benjamin.walker has joined the channel
@aritra55: @aritra55 has joined the channel
@oneandwholly: @oneandwholly has joined the channel

#troubleshooting

@jmeyer: Hello ! :wave: *I've got the following scenario :* • Data is integrated in multiple batches per day (in an OFFLINE table) ◦ *Batch 1:* _01/01/2021 (data date) - DATA 1, DATA 3, DATA 6 -> `Segment_1(date=01/01/2021, data="" 1, DATA 3, DATA 6])`_ ◦ *Batch 2:* _01/01/2021 (data date) - DATA 2, DATA 4, DATA 5 -> `Segment_2(date=01/01/2021, data="" 2, DATA 4, DATA 5])`_ • Data must be available asap, so 2 separate segments are generated & ingested into Pinot • Some data needs to be corrected after the initial data ingestion, say DATA 1 & DATA 2 I know it is possible to replace segments but, how can we handle replacing data across multiple segments ? Can we generate a new segment with only the modified data and ignore old data in previous segments ? (`Segment_1` & `Segment_2`) -> `Segment_3(date=01/01/2021, data="" 1, DATA 2])` Or do we have to regenerate the 2 segments entirely ? (if so, we need to identify what they contain) - Possibly after merging them ?
@mayanks: You can regenerate the two segments (using same name as existing segments) and push them to Pinot. Currently this is not an atomic transaction so there may be some small time period when one segment is old and another is new. This is being worked on to fix. @snlee
@jmeyer: Thanks @mayanks So it is necessary to know the contents of the 2 segments and regenerate them with the same data as before (+ updates) ? Sounds like this could be non trivial in some cases
@jmeyer: What is Pinot behavior when duplicated data exists ? A 3rd segment with some data already present in the first 2 The notion of "duplicated" implies we have a primary key, which is not the case on OFFLINE table iirc So I guess we would simply have "duplicated" lines
@mayanks: Pinot won’t know that it is duplicate data, and will be included in query processing
@mayanks: If you are generating daily segments then replacing one days segments should be straight forward
@jmeyer: > If you are generating daily segments then replacing one days segments should be straight forward The difficulty is that not all day's data may / will arrive at the same time but ingestion Hence • *Batch 1:* _01/01/2021 (data date) - DATA 1, DATA 3, DATA 6 -> `Segment_1(date=01/01/2021, data="" 1, DATA 3, DATA 6])`_ • *Batch 2:* _01/01/2021 (data date) - DATA 2, DATA 4, DATA 5 -> `Segment_2(date=01/01/2021, data="" 2, DATA 4, DATA 5])`_ In the end, I feel like my question is "how can we update part of a segment" ? I feel like it's not possible then It looks like there's only 2 ways to reach my goal then : 1. Only have a single segment per day at a time so a. Drop day's segment b. Regenerate segment with updated data [99% of the data may not have changed, so pretty inefficient] 2. Identify impacted segments & regenerate impacted one only (in their entirety) What do you think ? :slightly_smiling_face:
@mayanks: Is your offline pipeline not generating daily partitions? Typically offline pipelines would created time partitioned folders, and generating segment from one folder will guarantee to not overlap with other days
@jmeyer: It is but we have 3 additional constraints • Data for a given day can arrive in multiple parts (for the same day) [imagine the case with N timezones} • Partial data need to be available asap (can't wait for other parts) • Need to be able to update some data later on (doesn't need to be perfectly efficient, as it's clearly not an ideal case for OLAP)
@mayanks: Do you not have realtime component? If you do then you can serve data from realtime while your offline settles?
@jmeyer: I feel like this would help, but no, data comes in batches from external sources.. I'll keep that in mind still
@mayanks: What is the max delay for data to arrive? Does one day's worth of data settle in a day or so? Or it can take several days / weeks?
@mayanks: Also, even if your incoming data is not partitioned, you can always generate segments to guarantee the data belongs to one day (eg pick several folders to scan and select data only for single day to generate input for pinot segment)
@jmeyer: > What is the max delay for data to arrive? Does one day's worth of data settle in a day or so? Or it can take several days / weeks? Typically much less than a month but it is technically unbounded - customer data could theoretically be corrected months after first ingestion
@mayanks: When it arrives after a month, which folder does it land in? Is it in the correct date folder? Also, how do you know which older folders got changed?
@mayanks: Throwing an idea, if you can find the delta between what was pushed to Pinot as part of daily push, and the delta so far across all days, you can have a set of segments for daily, and another set (perhaps very small say 1 or 2 segments) that represent delta across all days, and keep refreshing that delta segments
@mayanks: It works if your delta is tiny, but may not scale if delta is huge
@jmeyer: > Also, even if your incoming data is not partitioned, you can always generate segments to guarantee the data belongs to one day (eg pick several folders to scan and select data only for single day to generate input for pinot segment) If I understand correctly, you're saying that after every batch, we generate the whole pinot segment ? For example, we've got a single file per batch and after every batch, we could regenerate a single pinot segment from every one of these files Meaning we always keep a single Pinot segment (per day) at a time, and replacing it is straightforward
@mayanks: Discussed offline, @jmeyer to summarize.
@jmeyer: Yes :slightly_smiling_face:
@jmeyer: *Summary :* _*Context :*_ • Data comes in batches every day (for the same day) • Each batch generates a new file • Data must be available asap (i.e. can't wait having all the data before generating a segment) • Data correction can come in later (weeks) *Solution discussed :* • While, every batch generates a new separate file, the goal is to keep having a single Pinot segment per day at a time • To do so, after every next batch, ◦ Merge every file for the day before calling CreateSegment to generate a new segment containing all (existing) data for the day ▪︎ Later, a new feature will allow generating a single Pinot segment out of multiple input files, dropping the need for file concatenation ◦ This new segment will replace the existing one (for the day)
@jmeyer: This solution means that we only need to regenerate a single segment per day impacted with data correction However, if data correction happens along another dimension than time, say if we have (date, entity, value) - correcting all values for a given entity will result in the regeneration of *all* segments
@jmeyer: @mayanks Summary sounds ok ?
@mayanks: Yes, thanks
@mbracke: @mbracke has joined the channel
@brijdesai6: @brijdesai6 has joined the channel
@laurachen: @laurachen has joined the channel
@mayanks: @jlli Do we have a doc to describe the preprocessing for partition/sort before ingestion? If so could you share? If not, could we add the doc? cc: @syedakram93
@jlli: Hey @syedakram93, yes we do have the design doc on preprocessing job, while it’s still in LinkedIn internal dir. Let me put it to the wiki page. In the meantime, you can refer to this file to see how it’s getting used:
@mayanks: @jlli If we can add it to , that would be great
@jlli: Yeah that’s where I’m going to add to
@mayanks: thanks
@benjamin.walker: @benjamin.walker has joined the channel
@aritra55: @aritra55 has joined the channel
@oneandwholly: @oneandwholly has joined the channel
@ken: I’ve been fooling around with how Pinot handles the “URI push” of segments. It seems like if I’m not using HDFS for deep storage, then the controller will download the segments before pushing to the server, which seems like it’s not a win. Is that correct? And (so far) I haven’t been able to configure the controller to successfully handle an HDFS URI push request, at least when I’m not using HDFS for deep storage - I see the msg when the controller starts up that the “hdfs file system” was initialized, but when it gets the URI push request, it fails with an error about the hdfs file system not being initialized. Any ideas?
@mayanks: URI push should work for all deep-storages that provide uri based access (HDFS/ADLS/GCP/S3), only exception is NFS I'd think
@mayanks: Unsure about why you are seeing that behavior, would need more debugging.
@ken: I was trying to figure out if you could do an HDFS URI push without enabled deep storage for the same. So instead of pushing actual segments through the controller to be stored locally by server processes, you’d push the URI and the server process would download locally. Sounds like that’s not supported.
@g.kishore: you need to use URI with metadata push

#pinot-dev

@mayanks: @snlee, did we get any consensus on the 0.8.0 release timeline?
@snlee: Here is the changelog since the last release from the master’s branch. I’m trying to come up with the list of new major features. If anyone has the feature to highlight, please update to this thread.
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2021-05-14)

#general

#random

#troubleshooting

#pinot-dev

Reply via email to