The query which I'm testing now(trying to avoid the deduplication query
because of tombstones) is *almost* correct but there are two questions
which I can find an answer to:
1. Some of the *id*'s are just stopping to be produced.
2. Does the Tuble window select only the records whose upd_ts is new
>
> Are you sure that the null records are not actually tombstone records? If
> you use upsert tables you usually want to have them + compaction. Or how
> else will you deal with deletions?
yes they are tombstone records, but i cannot avoid them because the
deduplication query cant produce an appe
Hi,
Are you sure that the null records are not actually tombstone records? If
you use upsert tables you usually want to have them + compaction. Or how
else will you deal with deletions?
Is there anyone who is successfully deduplicating CDC records into either
> kafka topic or S3 files(CSV/parquet
Unfortunately using row_ts doesn't help. Setting the kafka
topic cleanup.policy to compact is not a very good idea as it increases
cpu, memory and might lead to other problems.
So for now I'll just ignore the null records. Is there anyone who is
successfully deduplicating CDC records into either ka
Thanks for the quick reply, Timo. Ill test with the row_ts and compaction
mode suggestions. However, ive read somewhere in the archives that the
append only stream is only possible if i extract "the first" record from
the ranking only which in my case is the oldest record.
Regards
On Mon, Feb 8,
Hi,
could the problem be that you are mixing OVER and TUMBLE window with
each other? The TUMBLE is correctly defined over time attribute `row_ts`
but the OVER window is defined using a regular column `upd_ts`. This
might be the case why the query is not append-only but updating.
Maybe you ca
Hi,
AFAIK this should be supported in 1.12 via FLINK-19568 [1]
I'm pulling in Timo and Jark who might know better.
https://issues.apache.org/jira/browse/FLINK-19857
Regards,
Roman
On Mon, Feb 8, 2021 at 9:14 AM meneldor wrote:
> Any help please? Is there a way to use the "Last row" from a de
Any help please? Is there a way to use the "Last row" from a deduplication
in an append-only stream or tell upsert-kafka to not produce *null* records
in the sink?
Thank you
On Thu, Feb 4, 2021 at 1:22 PM meneldor wrote:
> Hello,
> Flink 1.12.1(pyflink)
> I am deduplicating CDC records coming f
Hello,
Flink 1.12.1(pyflink)
I am deduplicating CDC records coming from Maxwell in a kafka topic. Here
is the SQL:
CREATE TABLE stats_topic(
> `data` ROW<`id` BIGINT, `account` INT, `upd_ts` BIGINT>,
> `ts` BIGINT,
> `xid` BIGINT ,
> row_ts AS TO_TIMESTAMP(