Hello, I have a follow-up question to this: since Flink doesn't support state expiration at the moment (e.g. expiring state which hasn't been updated for a certain amount of time), would it be possible to clear up old UDF states by: - store a 'last_updated" timestamp in the state value - periodically (e.g. monthly) go through all the state values in RocksDB, deserialize them using TypeSerializer and read the "last_updated" property - delete the key from RocksDB if the state's "last_updated" property is over a month ago
Is there any reason this approach wouldn't work, or anything to be careful of? Thanks, Josh On Mon, Apr 18, 2016 at 8:23 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > key refers to the key extracted by your KeySelector. Right now, for every > named state (i.e. the name in the StateDescriptor) there is a an isolated > RocksDB instance. > > Cheers, > Aljoscha > > On Sat, 16 Apr 2016 at 15:43 Igor Berman <igor.ber...@gmail.com> wrote: > >> thanks a lot for the info, seems not too complex >> I'll try to write simple tool to read this state. >> >> Aljoscha, does the key reflects unique id of operator in some way? Or key >> is just a "name" that passed to ValueStateDescriptor. >> >> thanks in advance >> >> >> On 15 April 2016 at 15:10, Stephan Ewen <se...@apache.org> wrote: >> >>> One thing to add is that you can always trigger a persistent checkpoint >>> via the "savepoints" feature: >>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/savepoints.html >>> >>> >>> >>> On Fri, Apr 15, 2016 at 10:24 AM, Aljoscha Krettek <aljos...@apache.org> >>> wrote: >>> >>>> Hi, >>>> for RocksDB we simply use a TypeSerializer to serialize the key and >>>> value to a byte[] array and store that in RocksDB. For a ListState, we >>>> serialize the individual elements using a TypeSerializer and store them in >>>> a comma-separated list in RocksDB. The snapshots of RocksDB that we write >>>> to HDFS are regular backups of a RocksDB database, as described here: >>>> https://github.com/facebook/rocksdb/wiki/How-to-backup-RocksDB%3F. You >>>> should be possible to read them from HDFS and restore them to a RocksDB >>>> data base as described in the linked documentation. >>>> >>>> tl;dr As long as you know the type of values stored in the state you >>>> should be able to read them from RocksDB and deserialize the values using >>>> TypeSerializer. >>>> >>>> One more bit of information: Internally the state is keyed by (key, >>>> namespace) -> value where namespace can be an arbitrary type that has a >>>> TypeSerializer. We use this to store window state that is both local to key >>>> and the current window. For state that you store in a user-defined function >>>> the namespace will always be null and that will be serialized by a >>>> VoidSerializer that simply always writes a "0" byte. >>>> >>>> Cheers, >>>> Aljoscha >>>> >>>> On Fri, 15 Apr 2016 at 00:18 igor.berman <igor.ber...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> we are evaluating Flink for new solution and several people raised >>>>> concern >>>>> of coupling too much to Flink - >>>>> 1. we understand that if we want to get full fault tolerance and best >>>>> performance we'll need to use Flink managed state(probably RocksDB >>>>> backend >>>>> due to volume of state) >>>>> 2. but then if we latter find that Flink doesn't answer our needs(for >>>>> any >>>>> reason) - we'll need to extract this state in some way(since it's the >>>>> only >>>>> source of consistent state) >>>>> In general I'd like to be able to take snapshot of backend and try to >>>>> read >>>>> it...do you think it's will be trivial task? >>>>> say If I'm holding list state per partitioned key, would it be easy to >>>>> take >>>>> RocksDb file and open it? >>>>> >>>>> any thoughts regarding how can I convince people in our team? >>>>> >>>>> thanks in advance! >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Accessing-StateBackend-snapshots-outside-of-Flink-tp6116.html >>>>> Sent from the Apache Flink User Mailing List archive. mailing list >>>>> archive at Nabble.com. >>>>> >>>> >>> >>