On 2022/01/17 18:23:56 ch...@cmartinit.co.uk wrote:
> We’re evaluating Pulsar for something of an unusual use case in that we
want to create a number of topics with a very large number of partitions
(tens , or ideally even hundreds of thousands). The reasons here is that we
want consumers to be able to seek efficiently to a given message key.  By
hashing a given key to a given topic partition we can let consumers
subscribe only to that partition and thus ignore the vast majority of other
messages.

Interesting challenge.

> Does anyone have any hints as to how I can achieve what I want here[1]
or, alternatively confirm that Pulsar is the wrong tool for the job? I do
realise that I could remodel the situation as having 50k topics each with a
single partition, but I’m assuming that as far as pulsar is concerned these
two situations are largely equivalent as an n-partition topic is modelled
as n individual topics under the hood.

Exactly, the situation should be remodeled with individual topics. In
Pulsar, at the low-level, a partitioned topic is a group of ordinary
topics. I'd say with the information you provided that the problem that you
are describing could and should be solved without partitioned topics. The
way to solve this would be to have some custom metadata for doing the
"routing decision" which you are referring to as hashing. Pulsar
partitioned topics are not designed for thousands or hundreds of thousands
of partitions. When your design is decoupled from partitioned topics, there
will be a lot more flexibility in evolving the design and the solution.

> [1] where “doing what I want” could either be setting up pulsar to have
topics with a large number of partitions or, more generally, some pattern
that would allow consumers to be able to efficiently consume a given
message key when the number of message keys is measured in the hundreds of
thousand or even millions.
>

When you say "seek efficiently to a given message key" or "efficiently
consume a given message key", what you do mean? Is there a single message
for each key or is it about consuming all messages with a given message key?

BR,

Lari

Reply via email to