Hi

I think there is no benchmark about this at present

Best,
Shammon

On Sun, Feb 26, 2023 at 5:49 PM Gen Luo <luogen...@gmail.com> wrote:

> Hi,
>
> I suppose there are two things to clarify:
> 1. the function object will only be serialized once when deploying and
> deserialized once per task while initializing.
> 2. the ValueState itself is only a handle. It is set/setup with the key of
> each incoming record. When serializing, it is only a null field.
>
> In conclusion, the cost of serialization should be constant,
> relatively ignorable, and there should be no impact on the performance no
> matter whether it is a transient.
> By the way, preparing, accessing and updating the value state can indeed
> be costly if the number of partitions is really large, but it has nothing
> to do with the `transient` qualifier.
>
> On Fri, Feb 24, 2023 at 7:49 PM simple <1028108...@qq.com> wrote:
>
>> 稍等,我看一下您的反馈
>>
>> ------------------------------
>> 发自我的iPhone
>>
>>
>> ------------------ Original ------------------
>> *From:* Zhongpu Chen <chenlov...@gmail.com>
>> *Date:* Fri,Feb 24,2023 7:47 PM
>> *To:* user <user@flink.apache.org>
>> *Subject:* Re: RE: Re: Re: Should we always mark ValueState as
>> "transient" forRichFunctions
>>
>> Hi Shammon,
>>
>> Sorry for the inaccurate description of my last reply. Let me restate my
>> question again:
>>
>> Fact 1: we know that ValueState here should not
>> serialized/de-serialized, so it is a good practice to mark it with
>> "transient".
>>
>> Fact 2: on the other hand, if we don't mark it with "transient", it will
>> be initialized to null, and this null value will be
>> serialized/de-serialized. I think it will occur some overhead if the
>> number of partitions is very large.
>>
>> Given the two facts above, the program works well in both cases in terms
>> of accuracy. And my question is: is there any performance benchmark in
>> real (large) applications to compare two cases?
>>
>> Feel free to point out if I've misunderstood.
>>
>> On 2023/02/24 11:01:51 Shammon FY wrote:
>> > Hi
>> >
>> > Sorry that I don't quite understand your question. I think the above
>> > functions will only be deserialized when the job is submitted, do you
>> want
>> > to test the impact of this on submission throughput?
>> >
>> > Best,
>> > Shammon
>> >
>> >
>> > On Fri, Feb 24, 2023 at 3:04 PM Zhongpu Chen wrote:
>> >
>> > > Hi Gen,
>> > >
>> > > Thanks for your explanation.
>> > >
>> > > Back to this code snippet, since they are not marked with "transient"
>> > > now, I suppose Flink will use avro to serialize them (null values). Is
>> > > there any benchmark to show the performance test between null values
>> > > serialization and "transient"? I mean, it is indeed not good to write
>> > > them with "transient", but it works. So is there any performance
>> lose here?
>> > >
>> > >
>> > > On 2023/02/24 06:47:21 Gen Luo wrote:
>> > > > Hi,
>> > > >
>> > > > ValueState is a handle rather than an actual value. So it should
>> never
>> > > be
>> > > > serialized. In fact, ValueState itself is not a Serializable. It
>> > > should be
>> > > > ok to always mark it as transient.
>> > > >
>> > > > In this case, I suppose it works because the ValueState is not set
>> > > (which
>> > > > happens during the runtime) when the function is serialized (while
>> > > > deploying). But it's not good.
>> > > >
>> > > > On Fri, Feb 24, 2023 at 10:29 AM Zhongpu Chen
>> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > When I am reading the code from flink-training-repo [1], I
>> noticed the
>> > > > > following code:
>> > > > >
>> > > > > ```java
>> > > > >
>> > > > > public static class EnrichmentFunction
>> > > > > extends RichCoFlatMapFunction {
>> > > > >
>> > > > > private ValueState rideState; private
>> > > ValueState fareState;
>> > > > > ...
>> > > > > }
>> > > > >
>> > > > > ```
>> > > > >
>> > > > > From my understanding, since ValueState variables here are scoped
>> > > to each
>> > > > > instance, they should not be serialized for the performance sake.
>> > > Thus, we
>> > > > > should always mark them with "transient". Similar discussion can
>> be
>> > > found
>> > > > > here [2].
>> > > > >
>> > > > > Should we always mark ValueState as "transient", and why? Please
>> > > help me
>> > > > > to figure it out.
>> > > > >
>> > > > > [1]
>> > > > >
>> > >
>> > >
>>
>> https://github.com/apache/flink-training/blob/master/rides-and-fares/src/solution/java/org/apache/flink/training/solutions/ridesandfares/RidesAndFaresSolution.java
>> > > > >
>> > > > > [2]
>> > > > >
>> > >
>> > >
>>
>> https://stackoverflow.com/questions/72556202/flink-managed-state-as-transient
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to