Hi, I'm the maintainer of flink-connector-pulsar. I would like to
start a survey on a function change proposal in
flink-connector-pulsar.

I have created a ticket
<https://issues.apache.org/jira/browse/FLINK-30413> on JIRA and paste
its description here:

A lot of Pulsar connector test unstable issues are related to Shared
and Key_Shared subscription. Because this two subscription is designed
to consume the records in an unordered way. And we can support
multiple consumers in same topic partition. But this feature lead to
some drawbacks in connector.

1. Performance

Flink is a true stream processor with high correctness support. But
support multiple consumer will require higher correctness which
depends on Pulsar transaction. But the internal implementation of
Pulsar transaction on source is record the message one by one and
stores all the pending ack status in client side. Which is slow and
memory inefficient.

This means that we can only use Shared and Key_Shared on Flink with
low throughput. This against our intention to support these two
subscription. Because adding multiple consumer to same partition can
increase the consuming speed.

2. Unstable

Pulsar transaction acknowledge the messages one by one in an internal
Pulsar's topic. But it's not stable enough to get it works. A lot of
pending issues in Flink JIRA are related to Pulsar transaction and we
don't have any workaround.

3. Complex

Support Shared and Key_Shared subscription make the connector's code
more complex than we expect. We have to make every part of code into
ordered and unordered way. Which is hard to understand for the
maintainer.

4. Necessary

The current implementation on Shared and Key_Shared is completely
unusable to use in Production environment. For the user, this function
is not necessary. Because there is no bottleneck in consuming data
from Pulsar, the bottleneck is in processing the data, which we can
achieve by increasing the parallelism of the processing operator.

Reply via email to