Re: RMQSource non-parallel, seems inconsistent with documentation

Austin Cawley-Edwards Fri, 18 Feb 2022 16:57:22 -0800

Hey Daniel,

I think you're right that the docs are misleading in this case – anything
that extends SourceFunction will always execute at parallelism 1, set
parallelism is ignored. Explicitly setting parallelism in the example in
the docs is unnecessary and confusing. I personally have only used this
with exactly-once semantics (thus always configured parallelism 1), so not
sure what performance limitations there may be with high volume streams. It
might be worth a try to build your own on RichParallelSourceFunction if
you're up for it – I'm sure other people would be interested in the
response.


Best,
Austin

On Fri, Feb 18, 2022 at 5:09 PM Daniel Hristov <dhris...@dragos.com> wrote:

> Hello,
>
>
>
> I’ve noticed that
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/rabbitmq/
> suggests that the RabbitMQ Source can be used with parallelization bigger
> than 1, but one can’t get exactly-once delivery guarantee there. However,
> the inheritance chain seems to show RMQSource ->
> MultipleIdsMessageAcknowledgingSourceBase -> MessageAcknowledgingSourceBase
> -> RichSourceFunction (rather than RichParallelSourceFunction)
>
> I imagine either the documentation or the implementation has to be
> corrected here. Any thoughts on what imposes that parallelism on 1 here?
> The only place I found it to be checked up the hierarchy seems to be
> MessageAcknowledgingSourceBase::initializeState
>
>
>
> Have you found this to impose a limitation on the performance of pulling
> messages from Rabbit (assuming that heavier enrichments down the chain are
> properly parallelized)?
>
>
>
> Best, Daniel
>

Re: RMQSource non-parallel, seems inconsistent with documentation

Reply via email to