Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-07 Thread Benedict Jin
Hi Julian Jaffe, Thank you very much. I haven't tried it yet. Can you provide a more specific example. In theory, adding indexes will slow down the speed of adding and updating operations. In your scenario, what percentage is this performance loss reached? Yes, for the bottleneck of

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-07 Thread Julian Jaffe
Hey Benedict, Have you tried creating indices on your segments table? I’ve managed Druid clusters with orders of magnitude more segments without this issue by indexing key filter columns. (The coordinator is still a painful bottle neck, just not due to query times to the metadata server )

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Jihoon Son, Yes, it does bring some compatibility issues. I was checking the latest metadata information just now. At present, the total number of records in the metadata table is five million, of which nearly half are marked as used, and the physical resources of the machine where the

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Jihoon Son
For this sort of issue, we should think about if there is any other way that can address the same problem without modifying metadata table schema. Because, modifying metadata table schema introduces compatibility issues, such as the upgrade path for existing users. Benedict, as Samarth and Lucas

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Ben Krug, +1 for adding the is_deleted column, and then we can create a timing trigger to clear these old records. Regards, Benedict Jin On 2021/04/06 18:28:45, Ben Krug wrote: > Oh, that's easier than tombstones. flag is_deleted and update timestamp > (so it gets pulled again). > > On

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Samarth Jain, Thanks. The main reason is the huge amount of metadata, which leads to a very slow process of scanning the full table of metadata storage and deserializing metadata. Yes, I have tried to clean up the metadata. Regards, Benedict Jin On 2021/04/06 17:20:26, Samarth Jain wrote:

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Ben Krug, Thank you very much for your ideas, but I also feel that the introduction of Cassandra is too heavy. The tombstones feature in Cassandra you mentioned can actually be supported by timed tasks in MySQL or PostgreSQL. Regards, Benedict Jin On 2021/04/06 15:08:03, Ben Krug wrote:

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Abhishek Agarwal, You made a very important point, thank you very much. Regards, Benedict Jin On 2021/04/06 11:02:34, Abhishek Agarwal wrote: > If an entry is deleted from the metadata, how is the coordinator going to > update its own state? > > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Itai Yaffe, Thank you very much for your support, thank you. Regards, Benedict Jin On 2021/04/06 10:06:45, Itai Yaffe wrote: > Hey, > I'm not a Druid developer, so it's quite possible I'm missing many > considerations here, but from a first glance, I like your offer, as it > resembles the

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Lucas Capistrant
Hey Benedict, Adding on to what Samarth says in their reply, could you provide some more context on this one to help the group understand more about your issue: - Is this the area of the code that you are saying in non-performant? Link

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Ben Krug
Oh, that's easier than tombstones. flag is_deleted and update timestamp (so it gets pulled again). On Tue, Apr 6, 2021 at 10:48 AM Tijo Thomas wrote: > Abhishek, > Good point. Do we need one more col for storing if it's deleted or not? > > On Tue, Apr 6, 2021 at 4:32 PM Abhishek Agarwal > >

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Tijo Thomas
Abhishek, Good point. Do we need one more col for storing if it's deleted or not? On Tue, Apr 6, 2021 at 4:32 PM Abhishek Agarwal wrote: > If an entry is deleted from the metadata, how is the coordinator going to > update its own state? > > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe wrote: > >

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Samarth Jain
Hi Benedict, I am curious to understand what functionality of Druid are you seeing the slowness in? Is it the coordinator work of assigning segments to historicals that is slower or is it the querying of segment information that is slower? Have you looked into CPU/network metrics for your

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Ben Krug
I suppose, if we were going down this path, something like tombstones in Cassandra could be used. But it would increase the complexity significantly. Ie, a new row is inserted with a deletion marker and a timestamp, that indicates that the corresponding row is deleted. Now, when anyone does scan

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Abhishek Agarwal
If an entry is deleted from the metadata, how is the coordinator going to update its own state? On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe wrote: > Hey, > I'm not a Druid developer, so it's quite possible I'm missing many > considerations here, but from a first glance, I like your offer, as it >

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Itai Yaffe
Hey, I'm not a Druid developer, so it's quite possible I'm missing many considerations here, but from a first glance, I like your offer, as it resembles the *tsColumn *in JDBC lookups ( https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup ).

Propose a scheme for Coordinator to pull metadata incrementally

2021-04-05 Thread Benedict Jin
Hi all, Recently, when the Coordinator in our company's Druid cluster pulls metadata, there is a performance bottleneck. The main reason is the huge amount of metadata, which leads to a very slow process of scanning the full table of metadata storage and deserializing metadata. The size of the