On 2022/01/17 18:23:56 ch...@cmartinit.co.uk wrote: > We’re evaluating Pulsar for something of an unusual use case in that we want to create a number of topics with a very large number of partitions (tens , or ideally even hundreds of thousands). The reasons here is that we want consumers to be able to seek efficiently to a given message key. By hashing a given key to a given topic partition we can let consumers subscribe only to that partition and thus ignore the vast majority of other messages.
Interesting challenge. > Does anyone have any hints as to how I can achieve what I want here[1] or, alternatively confirm that Pulsar is the wrong tool for the job? I do realise that I could remodel the situation as having 50k topics each with a single partition, but I’m assuming that as far as pulsar is concerned these two situations are largely equivalent as an n-partition topic is modelled as n individual topics under the hood. Exactly, the situation should be remodeled with individual topics. In Pulsar, at the low-level, a partitioned topic is a group of ordinary topics. I'd say with the information you provided that the problem that you are describing could and should be solved without partitioned topics. The way to solve this would be to have some custom metadata for doing the "routing decision" which you are referring to as hashing. Pulsar partitioned topics are not designed for thousands or hundreds of thousands of partitions. When your design is decoupled from partitioned topics, there will be a lot more flexibility in evolving the design and the solution. > [1] where “doing what I want” could either be setting up pulsar to have topics with a large number of partitions or, more generally, some pattern that would allow consumers to be able to efficiently consume a given message key when the number of message keys is measured in the hundreds of thousand or even millions. > When you say "seek efficiently to a given message key" or "efficiently consume a given message key", what you do mean? Is there a single message for each key or is it about consuming all messages with a given message key? BR, Lari