Re: Flight / gRPC scalability issue

2019-02-27 Thread Wes McKinney
It seems like this discussion would be relevant to the gRPC community. There are probably other issues at play, like ensuring that multiple streams through the same port do not block each other too much if one stream has messages of smaller size and another larger size, then the byte slices sent ar

Re: Flight / gRPC scalability issue

2019-02-24 Thread Antoine Pitrou
Le 24/02/2019 à 19:46, Wes McKinney a écrit : > OK, I don't know enough about sockets or networking to know what > hypothetical performance is possible with 16 concurrent packet streams > going through a single port (was the 5GB/s based on a single-threaded > or multithreaded benchmark? i.e. did

Re: Flight / gRPC scalability issue

2019-02-24 Thread Wes McKinney
OK, I don't know enough about sockets or networking to know what hypothetical performance is possible with 16 concurrent packet streams going through a single port (was the 5GB/s based on a single-threaded or multithreaded benchmark? i.e. did it simulate the the equivalent number / size / concurren

Re: Flight / gRPC scalability issue

2019-02-24 Thread Antoine Pitrou
Le 24/02/2019 à 18:35, Wes McKinney a écrit : > hi Antoine, > > All of the Flight traffic is going through a hard-coded single port > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/flight-benchmark.cc#L185 > > What happens if you spin up a different server (and use a differ

Re: Flight / gRPC scalability issue

2019-02-24 Thread Wes McKinney
hi Antoine, All of the Flight traffic is going through a hard-coded single port https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/flight-benchmark.cc#L185 What happens if you spin up a different server (and use a different port) for each thread? I'm surprised no one else has menti

Re: Flight / gRPC scalability issue

2019-02-24 Thread Antoine Pitrou
If that was the case, then we would see 100% CPU usage on all CPU cores, right? Here my question is why only 2.5 cores are saturated while I'm pinning the benchmark to 4 physical cores. Regards Antoine. Le 24/02/2019 à 14:29, Francois Saint-Jacques a écrit : > A quick glance suggests you're

Re: Flight / gRPC scalability issue

2019-02-24 Thread Francois Saint-Jacques
A quick glance suggests you're limited by the kernel copying memory around (https://gist.github.com/fsaintjacques/1fa00c8e50a09325960d8dc7463c497e). I think the next step is to use different physical hosts for client and server. This way you'll free resources for the server. François On Thu, Feb

Re: Flight / gRPC scalability issue

2019-02-21 Thread Antoine Pitrou
We're talking about the BCC tools, which are not based on perf: https://github.com/iovisor/bcc/ Apparently, using Linux perf for the same purpose is some kind of hassle (you need to write perf scripts?). Regards Antoine. Le 21/02/2019 à 18:40, Francois Saint-Jacques a écrit : > You can compi

Re: Flight / gRPC scalability issue

2019-02-21 Thread Francois Saint-Jacques
You can compile with dwarf (-g/-ggdb) and use `--call-graph=dwarf` to perf, it'll help the unwinding. Sometimes it's better than the stack pointer method since it keep track of inlined functions. On Thu, Feb 21, 2019 at 12:39 PM Antoine Pitrou wrote: > > Ah, thanks. I'm trying it now. The prob

Re: Flight / gRPC scalability issue

2019-02-21 Thread Antoine Pitrou
Ah, thanks. I'm trying it now. The problem is that it doesn't record userspace stack traces properly (it probably needs all dependencies to be recompiled with -fno-omit-frame-pointer :-/). So while I know that a lot of time is spent waiting for futextes, I don't know if that is for a legitimat

Re: Flight / gRPC scalability issue

2019-02-21 Thread Hatem Helal
I was thinking of this variant: http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html but I must admit that I haven't tried that technique myself. On 2/21/19, 4:41 PM, "Antoine Pitrou" wrote: I don't think that's the answer here. The question is not how to /visualize/

Re: Flight / gRPC scalability issue

2019-02-21 Thread Antoine Pitrou
I don't think that's the answer here. The question is not how to /visualize/ where time is spent waiting, but how to /measure/ it. Normal profiling only tells you where CPU time is spent, not what the process is idly waiting for. Regards Antoine. On Thu, 21 Feb 2019 16:29:15 + Hatem Hela

Re: Flight / gRPC scalability issue

2019-02-21 Thread Hatem Helal
I like flamegraphs for investigating this sort of problem: https://github.com/brendangregg/FlameGraph There are likely many other techniques for inspecting where time is being spent but that can at least help narrow down the search space. On 2/21/19, 4:03 PM, "Francois Saint-Jacques" wrote:

Re: Flight / gRPC scalability issue

2019-02-21 Thread Hatem Helal
I like flamegraphs for investigating this sort of problem: https://github.com/brendangregg/FlameGraph There are likely many other techniques for inspecting where time is being spent but that can at least help narrow down the search space. On 2/21/19, 4:29 PM, "Wes McKinney" wrote: Hi Fr

Re: Flight / gRPC scalability issue

2019-02-21 Thread Wes McKinney
Hi Francois, It *should* work out of the box. I spent some time to make sure it does. Can you open a JIRA? I recommend using the grpc-cpp conda-forge package. Wes On Thu, Feb 21, 2019, 11:03 AM Francois Saint-Jacques < fsaintjacq...@gmail.com> wrote: > Can you remind us what's the easiest way

Re: Flight / gRPC scalability issue

2019-02-21 Thread Antoine Pitrou
On Thu, 21 Feb 2019 11:02:58 -0500 Francois Saint-Jacques wrote: > Can you remind us what's the easiest way to get flight working with grpc? > clone + make install doesn't really work out of the box. You can install the "grpc-cpp" package from conda-forge. Our CMake configuration should pick it

Re: Flight / gRPC scalability issue

2019-02-21 Thread Francois Saint-Jacques
Can you remind us what's the easiest way to get flight working with grpc? clone + make install doesn't really work out of the box. François On Thu, Feb 21, 2019 at 10:41 AM Antoine Pitrou wrote: > > Hello, > > I've been trying to saturate several CPU cores using our Flight > benchmark (which sp

Flight / gRPC scalability issue

2019-02-21 Thread Antoine Pitrou
Hello, I've been trying to saturate several CPU cores using our Flight benchmark (which spawns a server process and attempts to communicate with it using multiple clients), but haven't managed to. The typical command-line I'm executing is the following: $ time taskset -c 1,3,5,7 ./build/relea