Hi Lei,

Have you tried to make the key smaller, and store a list of found keys as a
value?

Let's make the operator key a hash of your original key, and store a list
of the full keys in the state. You can play with your hash length to
achieve the optimal number of keys.

I hope this helps,
Peter

On Fri, Mar 29, 2024, 09:08 Lei Wang <leiwang...@gmail.com> wrote:

>
> Use RocksDBBackend to store whether the element appeared within the last
> one day,  here is the code:
>
> *public class DedupFunction extends KeyedProcessFunction<Long, IN,OUT>  {*
>
> *    private ValueState<Boolean> isExist;*
>
> *    public void open(Configuration parameters) throws Exception {*
> *        ValueStateDescriptor<boolean> desc = new ........*
> *        StateTtlConfig ttlConfig =
> StateTtlConfig.newBuilder(Time.hours(24)).setUpdateType......*
> *        desc.enableTimeToLive(ttlConfig);*
> *        isExist = getRuntimeContext().getState(desc);*
> *    }*
>
> *    public void processElement(IN in, .... ) {*
> *        if(null == isExist.value()) {*
> *            out.collect(in)*
> *            isExist.update(true)*
> *        } *
> *    }*
> *}*
>
> Because the number of distinct key is too large(about 10 billion one day
> ), there's performance bottleneck for this operator.
> How can I optimize the performance?
>
> Thanks,
> Lei
>
>

Reply via email to