Re: How to read a savepoint fast without exploding the memory

2025-02-17 Thread Jean-Marc Paulin
Hi Gabor, Thanks for the seedy turn around. I will rebase our flink to capture both Flink 1.20.1 and your PR, which I think is the only one that didn't make the cut. As for the migration to 2.x, this is not something I gave much thought about yet. We also have windows and TTL in the state. I did

Re: How to read a savepoint fast without exploding the memory

2025-02-14 Thread Gabor Somogyi
Hi Jean-Marc, FYI the changes are merged into master and release-1.20 so feel free to use it. In terms of state migration from 1.x to 2.x this can be challenging for the whole community so if you have some info please share. G On Wed, Feb 12, 2025 at 2:40 PM Jean-Marc Paulin wrote: > Hi Gabo

Re: How to read a savepoint fast without exploding the memory

2025-02-13 Thread Francis Anyuru
I am curious about what kind of disk you are using @Gabor. And is the environment you run your production one? On Wed, 12 Feb 2025 at 16:41, Jean-Marc Paulin wrote: > Hi Gabor > > Glad we helped, but thanks to you for working on this. The performance > improvement you made is fantastic. > > I th

Re: How to read a savepoint fast without exploding the memory

2025-02-12 Thread Jean-Marc Paulin
Hi Gabor Glad we helped, but thanks to you for working on this. The performance improvement you made is fantastic. I think on our side we will wait for the final official patch, and then decide if we move up to 1.20.1 (or whatever includes it) or just apply the patch on what we have. Our Flink is

Re: How to read a savepoint fast without exploding the memory

2025-02-12 Thread Gabor Somogyi
Hi Jean-Marc, Thanks for your efforts! We've done quite extensive tests inside and they are all passing. Good to see that the number of keys matches on your side too. Regarding key ordering, the state processor API is not giving any guarantees in terms of ordering. All in all I'm confident with t

Re: How to read a savepoint fast without exploding the memory

2025-02-12 Thread Jean-Marc Paulin
Hi gabor I see all my 1,331,301 keys with your patch, which is exactly what I am expecting. If you have more specific concerns I suppose I can instrument my code further. I have no expectation regarding the key ordering. JM On Wed, Feb 12, 2025 at 11:29 AM Gabor Somogyi wrote: > I think the h

Re: How to read a savepoint fast without exploding the memory

2025-02-12 Thread Gabor Somogyi
I think the hashmap vs patched RocksDB makes sense, at least I'm measuring similar number with relatively small states. RockDb commenting out the remove() is a bit surprisingly high but since it's causing correctness issues under some circumstances I would abandon that. > I probably need more time

Re: How to read a savepoint fast without exploding the memory

2025-02-12 Thread Jean-Marc Paulin
Hi Gabor, I applied your 1.20 patch, and I got some very good numbers from it... so for my 5GB savepoint, I made sure I skip all my code overhead to get the raw number, and I can read it in - HashMap : 4 minutes - RockDb with your patch: ~19 minutes - RockDb commenting out the remove(): 49 minutes

Re: How to read a savepoint fast without exploding the memory

2025-02-12 Thread Jean-Marc Paulin
Hi Gabor, I applied your 1.20 patch, and I got some very good numbers from it... so for my 5GB savepoint, I made sure I skip all my code overhead to get the raw number, and I can read it in - HashMap : 4 minutes - RockDb with your patch: ~19 minutes - RockDb commenting out the remove(): 49 minutes

Re: How to read a savepoint fast without exploding the memory

2025-02-11 Thread Jean-Marc Paulin
Hi Gabor, So, a bit of progress, I managed to compile our stuff against your 2.1-SNAPSHOT (with a bit of chopping around deprecated/changed and removed APIs - that wasn't too bad), but that failed to read the state I was using before (that was generated with a Flink 1.20). This is the stack trace

Re: How to read a savepoint fast without exploding the memory

2025-02-11 Thread Gabor Somogyi
Yeah, I think this is more like the 1.x and 2.x incompatibility. I've just opened the PR agains 1.20 which you can cherry-pick here [1]. [1] https://github.com/apache/flink/pull/26145 BR, G On Tue, Feb 11, 2025 at 4:19 PM Jean-Marc Paulin wrote: > Hi Gabor, > > So, a bit of progress, > > I ma

Re: How to read a savepoint fast without exploding the memory

2025-02-11 Thread Jean-Marc Paulin
Hi Gabor, Trying to but I struggle to compile my stuff against your Flink build... tried to apply your PR as a patch on my 1.20 modified fork and that didn't go well either. It will take time to untangle. Will keep you updated if I make progress JM On Mon, Feb 10, 2025 at 8:22 PM Gabor Somogy

Re: How to read a savepoint fast without exploding the memory

2025-02-10 Thread Gabor Somogyi
Hi Jean-Marc, FYI, I've just opened this [1] PR to address the issue in a clean way. May I ask you to test it on your side? [1] https://github.com/apache/flink/pull/26134 BR, G On Fri, Feb 7, 2025 at 6:14 PM Gabor Somogyi wrote: > Just a little update on this. We've made our first POC with t

Re: How to read a savepoint fast without exploding the memory

2025-02-07 Thread Gabor Somogyi
Just a little update on this. We've made our first POC with the redesigned approach and the numbers are promising :) It still requires huge efforts in development/correctness/performance perspective but seems like we have something in the pocket. Test data: 256Mb state file with a single operator

Re: How to read a savepoint fast without exploding the memory

2025-02-06 Thread Gabor Somogyi
In short, when you don't care about multiple KeyedStateReaderFunction.readKey calls then you're on the safe side. G On Wed, Feb 5, 2025 at 6:27 PM Jean-Marc Paulin wrote: > I am still hoping that I am still good. I just read the savepoint to > extract information (parallelism 1, and only 1 task

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Jean-Marc Paulin
I am still hoping that I am still good. I just read the savepoint to extract information (parallelism 1, and only 1 task manager) . I also know it has been created by a job using a HashMap backend. And I do not care about duplicates. I should still be good, right? from what I saw I never read any

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Gabor Somogyi
Hi Guys, We've just had an in-depth analysis and we think that removing that particular line causes correctness issues under some circumstances. Namely key duplicates can happen when multiple column families are processed at the same time. Not need to mention that it would cause multiple `readKey

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Salva Alcántara
Thanks both for your work on this! On a related note, since Queryable State (QS) is going away soon, streamlining the State Processor API as much as possible makes a lot of sense. Are there any plans on a migration guide or something for users to adapt their QS observers (beyond the current docs)

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Gabor Somogyi
Hi Jean-Marc, Thanks for your time investment and to share the numbers, it's super helpful. Ping me any time when you have further info to share. About the numbers: 48 minutes for 6Gb is not good but not terrible. I've seen petabyte scale states so I'm pretty sure we need to go beyond... Since w

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Jean-Marc Paulin
Hi Gabor, I finally got to run that change through. I have a 6Gb savepoint I read and parse for reference. - HashMap reads it in 14 minutes (but requires 10 Gb of RAM) - RockDb with the patch reads it in 48 minutes (and requires less than 2Gb) - RockDb without the patch wasn't even halfway through

Re: How to read a savepoint fast without exploding the memory

2025-02-04 Thread Gabor Somogyi
Just to give an update. I've applied the mentioned patch and the execution time drastically decreased (the gain is 98.9%): 2025-02-04 16:52:54,448 INFO o.a.f.e.s.r.FlinkTestStateReader [] - Execution time: PT14.690426S I need to double check what that would mean to correctness and all

Re: How to read a savepoint fast without exploding the memory

2025-02-04 Thread Gabor Somogyi
Please report back on how the patch behaves including any side effects. Now I'm in testing the state reading with processor API vs the mentioned job where we control the keys. The difference is extreme, especially because the numbers are coming from reading ~40Mb state file😅 2025-02-04 13:21:53,5

Re: How to read a savepoint fast without exploding the memory

2025-02-04 Thread Jean-Marc Paulin
That's a good idea, Sadly I have no control over the keys I was going to patch Flink with the suggestion in FLINK-37109 first to see how that goes. If that brings RockDb performance in an acceptable range for us we might go that way. I really

Re: How to read a savepoint fast without exploding the memory

2025-02-04 Thread Gabor Somogyi
What I could imagine is to create a normal Flink job, use execution.state-recovery.path=/path/to/savepoint set the operator UID on a custom written operator, which opens the state info for you. The only drawback is that you must know the keyBy range... this can be problematic but if you can do it

Re: How to read a savepoint fast without exploding the memory

2025-02-04 Thread Jean-Marc Paulin
Hi Gabor, I thought so. I was hoping for a way to read the savepoint in pages, instead of as a single blob up front which I think is what the hashmap does... we just want to be called for each entry and extract the bit we want in that scenario. Never mind Thank you for the insight. Saves me a lo

Re: How to read a savepoint fast without exploding the memory

2025-02-04 Thread Gabor Somogyi
Hi Jean-Marc, We've already realized that the RocksDB approach is not reaching the performance criteria which it should be. There is an open issue for it [1]. The hashmap based approach was and is always expecting more memory. So if the memory footprint is a hard requirement then RocksDB is the on

How to read a savepoint fast without exploding the memory

2025-02-04 Thread Jean-Marc Paulin
What would be the best approach to read a savepoint and minimise the memory consumption. We just need to transform it into something else for investigation. Our flink 1.20 streaming job is using HashMap backend, and is spread over 6 task slots in 6 pods (under k8s). Savepoints are saved on S3. A s