On Mon, Jan 9, 2012 at 10:29 PM, Chuck Remes <[email protected]> wrote:
> > * Hash maps are less gluttonous for memory than tries, and should be as > > fast or faster. > > Interesting assertion on performance. I googled around for this and it > looked like the answer was "it depends." So, tries are better sometimes > than hashes. > > For memory footprint, your statement is probably correct. > Sorry it wasn't meant as an assertion, as the "should" implies, but merely an educated guess. I'd also guess that it depends on the number of subscriptions; a hash map should result in less cache misses for big subscription sets. Which is faster would also on other things like the ratio between matching and unmatching incoming messages. > > The cost of mixed prefix/exact matching is two lookups instead of one > > (one for exact matching, one for prefix matching) but you can have fast > > paths for when only one kind of matching is used, making the added cost > > negligible unless you use a mix of prefix and exact matching. > > I don't understand why this would be true. All topic byte comparisons > should be "fail fast" so as soon as a byte mismatches the code can move on > to the next filter. For an *exact* match, if the comparison hasn't failed > and there are no more topic bytes to examine, the match is exact. If a > prefix match gets to the same point, it can skip the test to see if there > are more topic bytes to compare. > > So to me it looks like the algo for topic matching is nearly identical > except the exact-match algo has a flag that says "if we get to the end > without a comparison failure, make one final check that there are no more > topic bytes; if none, exact match, else fail." The regular check wouldn't > have that flag. > Depends on how you implement it. If implemented using a trie, for example, then what you say is true if I understand correctly. If you use a hash map for implementing exact matching, then you'll have to do two lookups (again, you could have fast paths to mitigate this if only prefix/exact matching is used). > I don't understand this last statement. Why would all N machines have to > process all messages? Wouldn't they be processing 1/N messages? And, they > would also only have 1/N of the subscription filters taking up memory. > > And with 3.1's improved publisher-side filtering, I don't see why any > machine would ever get all of the messages. > Assume there are 2 intermediary pubsub machines A and B, a publisher C and a subscriber D. D subscribes to "foo" which is the responsibility of token A, and thus only sends the subscription to A. C then wants to publish a message with topic "foobar", but it can't know which token to send the message to (correct answer: A) since foo and foobar don't necessarily belong to the same token. Or at least I'm not aware of any such hash functions :) I hope that made sense. /S
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
