Strongly +1 for having some initial benchmark as base before optimizations are implemented.
On Wed, Jan 4, 2017 at 5:26 PM, Benjamin Mahler <bmah...@apache.org> wrote: > Which areas does the performance not meet your needs? There are a lot of > aspects to libprocess that can be optimized, so it would be good to focus > on each of your particular use cases via benchmarks, this allows us to have > a shared way to profile and measure improvements. > > Copy elimination is one area where a lot of improvement can be made across > libprocess, note that libprocess was implemented before we had C++11 move > support available. We've recently made some improvements to update the HTTP > serving path towards zero-copies but it's not completely done. Can you > submit patches for the ProcessBase::send() path copy elimination? We can > have a move overload for ProcessBase::send and have ProtobufProcess::send() > and encode() perform moves instead of a copy. > > With respect to the MessageEncoder, since it's less trivial, you can submit > a benchmark that captures the use case you care about and we can drive > improvements using it. I have some suggestions here as well but we can > discuss once we have the benchmarks committed. > > How does that sound to start? > > On Tue, Jan 3, 2017 at 7:31 PM, pangbingqiang <pangbingqi...@huawei.com> > wrote: > > > Hi All: > > > > We use libprocess as our underlying communication library, but we find > > it’s performance don’t meet, we want to optimize it, for example: > > > > * ‘send’ function *implementation one metadata has four times memory > > copy, > > > > *1. ProtobufMessage SerializeToString then processbase ‘encode’ construct > > string once;* > > > > *2. In ‘encode’ function Message body copy again;* > > > > *3. In MessageEncoder in order to construct HTTP Request, copy again;* > > > > *4. **MessageEncoder return copy again;* > > > > How to optimize this scenario may be useful. > > > > Also , in libprocess it has so many lock: > > > > *1. **SocketManager: std::recursive_mutex mutex;* > > > > *2. **ProcessManager: std::recursive_mutex processes_mutex;* > *std::recursive_mutex > > runq_mutex; std::recursive_mutex firewall_mutex;* > > > > In particular, everytime event enqueue/dequeue both need to get lock, > > maybe use lookfree struct is better. > > > > > > > > If have any optimize suggestion or discussion, please let me know, > thanks. > > > > > > > > [image: cid:image001.png@01D0E8C5.8D08F440] > > > > > > > > Bingqiang Pang(庞兵强) > > > > > > > > Distributed and Parallel Software Lab > > > > Huawei Technologies Co., Ltd. > > > > Email:pangbingqi...@huawei.com <sut...@huawei.com> > > > > > > > > > > > -- Cheers, Zhitao Li