Hi,

How many partitions does your kafka topic have?

One possibility is that the kafka topic has only one partition,
and when the source parallelism is set to 2, one of the source
tasks cannot consume data and generate the watermark, so
the downstream operator cannot align the watermark and cannot
produce the data. [1]

You can check the record send of source sub-tasks.
If only one sub-task is outputting, you can set source idle timeout[2]
to avoid always waiting for the watermark.

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#dealing-with-idle-sources
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-source-idle-timeout

Best,
Weihua


On Wed, May 10, 2023 at 8:20 PM Sharil Shafie <sisha...@gmail.com> wrote:

> Hi,
>
> I use the Real Time Reporting with the Table API
> <https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/try-flink/table_api/>
> example to apply them in kubernetes by using flink kubernetes operator 1.4.
>
> When I use the job parallelism equal to 2, the spend_report table won't be
> inserted and kept empty. However, when I set parallelism to 1, the table
> gets inserted.
>
> The problem is that there is no exception (that I can find) that indicates
> why the table is not inserted. The job is shown to be running fine, where
> they are bytes and records received (refer to attached file). There is
> however a difference - where there is an info on 'Low Watermark' for
> setting parallelism to 1 and none when parallelism was 2.
>
> I have done this for both mysql and postgres and got the same outcome.
>
> I am fairly new to kubernetes and flink. Which part do I miss?
>
> The relevant files are below:
>
>    - SpendReport.java <https://pastebin.com/HUMhWqUM> (taken from flink
>    playgrounds with modification)
>    - deployment_with_job.yaml <https://pastebin.com/8BTyega7>(taken from
>    example in Kubernetes Operator repo with modification).
>    - Log when Parallelism = 2 <https://pastebin.com/U04QZg75>
>    - Log when Parallelism = 1 <https://pastebin.com/L0CC24Ke>
>    - Printscreen of task output for both settings (as attached).
>
>
> Regards.
>
>

Reply via email to