Re: Very large _metadata file

2020-03-21 Thread Jacob Sevart
https://github.com/apache/flink/pull/11475 On Sat, Mar 21, 2020 at 10:37 AM Jacob Sevart wrote: > Thanks, will do. > > I only want the time stamp to reset when the job comes up with no state. > Checkpoint recoveries should keep the same value. > > Jacob > > On Sat, Mar 21, 2020 at 10:16 AM Till

Re: Very large _metadata file

2020-03-21 Thread Jacob Sevart
Thanks, will do. I only want the time stamp to reset when the job comes up with no state. Checkpoint recoveries should keep the same value. Jacob On Sat, Mar 21, 2020 at 10:16 AM Till Rohrmann wrote: > Hi Jacob, > > if you could create patch for updating the union state metadata >

Re: Very large _metadata file

2020-03-21 Thread Till Rohrmann
Hi Jacob, if you could create patch for updating the union state metadata documentation that would be great. I can help with the review and merging this patch. If the value stays fixed over the lifetime of the job and you know it before starting the job, then you could use the config mechanism.

Re: Very large _metadata file

2020-03-20 Thread Jacob Sevart
Thanks, makes sense. What about using the config mechanism? We're collecting and distributing some environment variables at startup, would it also work to include a timestamp with that? Also, would you be interested in a patch to note the caveat about union state metadata in the documentation?

Re: Very large _metadata file

2020-03-17 Thread Till Rohrmann
Did I understand you correctly that you use the union state to synchronize the per partition state across all operators in order to obtain a global overview? If this is the case, then this will only work in case of a failover. Only then, all operators are being restarted with the union of all

Re: Very large _metadata file

2020-03-16 Thread Jacob Sevart
Thanks! That would do it. I've disabled the operator for now. The purpose was to know the age of the job's state, so that we could consider its output in terms of how much context it knows. Regular state seemed insufficient because partitions might see their first traffic at different times. How

Re: Very large _metadata file

2020-03-16 Thread Till Rohrmann
Hi Jacob, I think you are running into some deficiencies of Flink's union state here. The problem is that for every entry in your list state, Flink stores a separate offset (a long value). The reason for this behaviour is that we use the same state implementation for the union state as well as

Re: Very large _metadata file

2020-03-13 Thread Jacob Sevart
Oh, I should clarify that's 43MB per partition, so with 48 partitions it explains my 2GB. On Fri, Mar 13, 2020 at 7:21 PM Jacob Sevart wrote: > Running *Checkpoints.loadCheckpointMetadata *under a debugger, I found > something: >

Re: Very large _metadata file

2020-03-13 Thread Jacob Sevart
Running *Checkpoints.loadCheckpointMetadata *under a debugger, I found something: *subtaskState.managedOperatorState[0].sateNameToPartitionOffsets("startup-times").offsets.value *weights 43MB (5.3 million longs). "startup-times" is an operator state of mine (union list of java.time.Instant). I

Re: Very large _metadata file

2020-03-09 Thread Congxian Qiu
Hi As Gordon said, the metadata will contain the ByteStreamStateHandle, when writing out the ByteStreamStateHandle, will write out the handle name -- which is a path(as you saw). The ByteStreamStateHandle will be created when state size is small than `state.backend.fs.memory-threshold`(default is

Re: Very large _metadata file

2020-03-05 Thread Jacob Sevart
Thanks, I will monitor that thread. I'm having a hard time following the serialization code, but if you know anything about the layout, tell me if this makes sense. What I see in the hex editor is, first, many HDFS paths. Then gigabytes of unreadable data. Then finally another HDFS path at the

Re: Very large _metadata file

2020-03-05 Thread Kostas Kloudas
Hi Jacob, As I said previously I am not 100% sure what can be causing this behavior, but this is a related thread here: https://lists.apache.org/thread.html/r3bfa2a3368a9c7850cba778e4decfe4f6dba9607f32addb69814f43d%40%3Cuser.flink.apache.org%3E Which you can re-post your problem and monitor for

Re: Very large _metadata file

2020-03-04 Thread Jacob Sevart
Kostas and Gordon, Thanks for the suggestions! I'm on RocksDB. We don't have that setting configured so it should be at the default 1024b. This is the full "state.*" section showing in the JobManager UI. [image: Screen Shot 2020-03-04 at 9.56.20 AM.png] Jacob On Wed, Mar 4, 2020 at 2:45 AM

Re: Very large _metadata file

2020-03-04 Thread Tzu-Li (Gordon) Tai
Hi Jacob, Apart from what Klou already mentioned, one slightly possible reason: If you are using the FsStateBackend, it is also possible that your state is small enough to be considered to be stored inline within the metadata file. That is governed by the "state.backend.fs.memory-threshold"

Re: Very large _metadata file

2020-03-04 Thread Kostas Kloudas
Hi Jacob, Could you specify which StateBackend you are using? The reason I am asking is that, from the documentation in [1]: "Note that if you use the MemoryStateBackend, metadata and savepoint state will be stored in the _metadata file. Since it is self-contained, you may move the file and

Very large _metadata file

2020-03-03 Thread Jacob Sevart
Per the documentation: "The meta data file of a Savepoint contains (primarily) pointers to all files on stable storage that are part of the Savepoint, in form of absolute paths." I somehow have a _metadata file that's 1.9GB. Running *strings *on it I find 962 strings, most of which look like