Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/2991#issuecomment-62347765
Hi @tdas , thanks a lot for your comments. I've addressed all the comments
you mentioned before. Would you mind taking a look at the updated version?
Thanks a lot.
Besides there's one concern I have to bring out: the overhead of committing
offsets to Zookeeper. Since we now will update the offsets to ZK after pushing
data into WAL and BM, the request period is 200ms in normal situation. I'm not
this frequency will bring overhead to the ZK, but compared to the default
Kafka's commit frequency (1 mins), it is too frequent. In my local test,
because my cluster is a small cluster, it is quite fine, but I'm not sure if
the cluster reaches to hundreds of nodes.
If we need to do this synchronous offsets commit mechanism, this problem
cannot be easily solved, even use low level API. I think this problem can be
addressed by Kafka 0.9, it will manager the offsets itself, not rely on ZK, so
the ZK overhead will be alleviated.
So what is your opinion?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]