The thought experiment I did ended up having a set of front end servers
corresponding to a given chunk of the user id space, each of which was a
separate subscriber to the same set of partitions. The you have one or
more partitions corresponding to that same chunk of users. You want the
chunk/set of partitions to be of a size where each of those front end
servers can process all the messages in it and send out the
chats/notifications/status change notifications perhaps/read receipts,
to those users who happen to be connected to the particular front end node.

You would need to handle some deduplication on the consumers/FE servers
and would need to decide where to produce. Producing from every front
end server to potentially every broker could be expensive in terms of
connections and you might want to first relay the messages to the
corresponding front end cluster, but since we don't use large numbers of
producers it's hard for me to say.

For persistence and offline delivery you can probably accept a delay in
user receipt so you can use another set of consumers that persist the
messages to the longer latency datastore on the backend and then get the
last 50 or so messages with a bit of lag when the user first looks at
history (see hipchat and hangouts lag).

This gives you a smaller number of partitions and avoids the issue of
having to keep too much history on the Kafka brokers. There are
obviously a significant number of complexities to deal with. For example
if you are using default consumer code that commits offsets into
zookeeper it may be inadvisable at large scales you probably don't need
to worry about reaching. And remember I had done this only as a thought
experiment not a proper technical evaluation. I expect Kafka, used
correctly, can make aspects of building such a chat system much much
easier (you can avoid writing your own message replication system) but
it is definitely not plug and play using topics for users.

Christian


On 09/05/2014 09:46 AM, Jonathan Weeks wrote:
> +1
> 
> Topic Deletion with 0.8.1.1 is extremely problematic, and coupled with the 
> fact that rebalance/broker membership changes pay a cost per partition today, 
> whereby excessive partitions extend downtime in the case of a failure; this 
> means fewer topics (e.g. hundreds or thousands) is a best practice in the 
> published version of kafka. 
> 
> There are also secondary impacts on topic count — e.g. useful operational 
> tools such as: http://quantifind.com/KafkaOffsetMonitor/ start to become 
> problematic in terms of UX with a massive number of topics.
> 
> Once topic deletion is a supported feature, the use-case outlined might be 
> more tenable.
> 
> Best Regards,
> 
> -Jonathan
> 
> On Sep 5, 2014, at 4:20 AM, Sharninder <sharnin...@gmail.com> wrote:
> 
>> I'm not really sure about your exact use-case but I don't think having a
>> topic per user is very efficient. Deleting topics in kafka, at the moment,
>> isn't really straightforward. You should rethink your date pipeline a bit.
>>
>> Also, just because kafka has the ability to store messages for a certain
>> time, don't think of it as a data store. Kafka is a streaming system, think
>> of it as a fast queue that gives you the ability to move your pointer back.
>>
>> --
>> Sharninder
>>
>>
>>
>> On Fri, Sep 5, 2014 at 4:27 PM, Aris Alexis <aris.alexis....@gmail.com>
>> wrote:
>>
>>> Thanks for the reply. If I use it only for activity streams like twitter:
>>>
>>> I would want a topic for each #tag and a topic for each user and maybe
>>> foreach city. Would that be too many topics or it doesn't matter since most
>>> of them will be deleted in a specified interval.
>>>
>>>
>>>
>>> Best Regards,
>>> Aris Giachnis
>>>
>>>
>>> On Fri, Sep 5, 2014 at 6:57 AM, Sharninder <sharnin...@gmail.com> wrote:
>>>
>>>> Since you want all chats and mail history persisted all the time, I
>>>> personally wouldn't recommend kafka for your requirement. Kafka is more
>>>> suitable as a streaming system where events expire after a certain time.
>>>> Look at something more general purpose like hbase for persisting data
>>>> indefinitely.
>>>>
>>>> So, for example all activity streams can go into kafka from where
>>> consumers
>>>> will pick up messages to parse and put them to hbase or other clients.
>>>>
>>>> --
>>>> Sharninder
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <snowboard...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am building a big web application that I want to be massively
>>> scalable
>>>> (I
>>>>> am using cassandra and titan as a general db).
>>>>>
>>>>> I want to implement the following:
>>>>>
>>>>> real time web chat that is persisted so that user a in the future can
>>>>> recall his chat with user b,c,d much like facebook.
>>>>> mail like messages in the web application (not sure about this as it is
>>>>> somewhat covered by the first one)
>>>>> user activity streams
>>>>> users subscribing to topics for example florida/musicevents
>>>>>
>>>>> Could i use kafka for this? can you recommend another technology maybe?
>>>>>
>>>>
>>>
> 
> 


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to