merlimat commented on issue #2400: PIP-22: Dead Letter Topic
URL: https://github.com/apache/incubator-pulsar/pull/2400#issuecomment-415917621
@codelipenghui @sijie
I'm still convinced it's much easier to approach this problem in the
client side
Other than the scalability concerns and the complexity in changing the
dispatcher code, I think we should also consider:
* If the broker is re-publishing, we would need to have some complex
way to do authorization. Consumer has its own credentials, but
broker will have to create a producer using the broker credentials,
after having verified that consumer also has permission to publish
on the DLQ. Worse, if we revoke the permission on the DLQ topic, we
would also have to figure that out in broker. Moving the publishing
to consumer, will make that problem go away: consumer will use the
same credentials when re-publishing and it will work same as a
regular producer.
* If the consumer has a schema, it might be desirable to keep the
schema when republishing. Right now, the producer app needs to have
the Java pojo to create a schema enabled producer, but broker won't
have the Pojo definition.
* In the current implementation, it's not handling the fact that
different consumers on the same subscriptions could have different
DLQ settings. Eg: `consumer-a`: 10 retries and publish on `dlq-a`
and `consumer-a` 5 retries with republish on `dlq-b`. If done on
broker side, broker would have to somehow deal with these
inconsistencies. If the implementation is done on client library,
these inconsistencies will be fine: the broker just needs to keep
(and communicate) the current count of re-deliveries. Each consumer
will then take action based on its local configuration.
> a) doing DLQ in client actually make things a bit more
> complicated. because the redelivery count is tracked at
> broker, but the publishing happens at the client. Things like
> configurations/policies are separated in different places,
> which can result in inconsistent behaviors, which make
> debugging much harder when issue occurs.
The current PR is anyway configuring the DLQ in client API. If we
were to configure the DLQ as a server side policy we would probably
need to have fine-grained policies per-topic (while currently these
are only at the namespace level, except for authorization which is
both at namespace and topic level). For DLQ it would even be at
the subscription level.
> b) doing DLQ in client means the same logic has to be implemented in
> different language client. The implementation can quickly become
> inconsistent between different language client.
This I agree. It would require to do implementation in both Java and C++
client (plus the wrappers). My hope is that the client implementation
would be not be very complicated.
> c) regarding the resource concerns, I don't think the resource usage
> is different if it is still broker tracking the redelivery
> count. unless there is something I am missing.
Well, the difference here would be that broker just need to keep only
a counter, inside a map there already exist.
If the broker is doing the publishes, then it needs to create and cache
producers for all the DLQ topics. If each consumer has its own producers,
this could greatly increase the memory consumption in broker and we would
have to carefully tune the producer queues to make sure we don't get out
of direct memory in the worst case scenarios.
> d) doing DLQ in broker allows us doing optimization in future, like
> persisting delivery count to cursors or a retry queue, without
> changing client behaviors. especially if we want to support
> consistent behavior across different language clients.
Optmizations like storing the counter would possible in either case.
> "negative acks" is a good thing to have. However I don't think DLQ
> has to depend on "negative acks". because the current "negative
> acks" pulsar has is ack timeout. for DLQ, it doesn't really matter
> it is a ack timeout or explicit ack timeout. We don't really need to
> tie things together. If DLQ goes first, we still can introduce
> "negative acks" later, and it will not affect DLQ logic.
Sure, we can defer "negative acks" changes and just focus on current
re-delivery functionality.
> I think the simplest thing is just let user configure message
> retention in DLQ topic. That is what typically cloud providers
> offer. Not sure we need to get into the business of managing
> subscriptions for DLQ.
Again, the problem lies with policies which are currently namespace
wide.
I believe the best compromise solution we can adopt initially is to
leave the