subject:"Performance of BeamFnData between Python and Java"

Re: Performance of BeamFnData between Python and Java

2018-11-09 Thread Xinyu Liu

Really appreciate the pointers! Looks like the next step is to try increasing our bundle size. We will do some experiments on our side and report back later. @Robert: thanks a lot for the details on protobuf. It was pretty surprising to us that decoding protobuf messages slows down the performance

Re: Performance of BeamFnData between Python and Java

2018-11-08 Thread Robert Bradshaw

I'd assume you're compiling the code with Cython as well? (If you're using the default containers, that should be fine.) On Fri, Nov 9, 2018 at 12:09 AM Robert Bradshaw wrote: > > Very cool to hear of this progress on Samza! > > Python protocol buffers are extraordinarily slow (lots of reflection,

Re: Performance of BeamFnData between Python and Java

2018-11-08 Thread Robert Bradshaw

Very cool to hear of this progress on Samza! Python protocol buffers are extraordinarily slow (lots of reflection, attributes lookups, and bit fiddling for serialization/deserialization that is certainly not Python's strong point). Each bundle processed involves multiple protos being constructed a

Re: Performance of BeamFnData between Python and Java

2018-11-08 Thread Thomas Weise

We have been doing some end to end testing with Python and Flink (streaming). You could take a look at the following and possibly replicate it for your work: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/flink/flink_streaming_impulse.py We found that in order to get

Re: Performance of BeamFnData between Python and Java

2018-11-08 Thread Lukasz Cwik

This benchmark[1] shows that Python is getting about 19mb/s. Yes, running more python sdk_worker processes will improve performance since Python is limited to a single CPU core. [1] https://performance-dot-grpc-testing.appspot.com/explore?dashboard=5652536396611584&widget=490377658&container=1286

Re: Performance of BeamFnData between Python and Java

2018-11-08 Thread Xinyu Liu

19mb/s throughput is enough for us. Seems the bottleneck is the rate of RPC calls. Our message size is usually 1k ~ 10k. So if we can reach 19mb/s, we will be able to process ~4k qps, that meets our requirements. I guess increasing the size of the bundles will help. Do you guys have any results fro

Re: Performance of BeamFnData between Python and Java

2018-11-08 Thread Xinyu Liu

By looking at the gRPC dashboard published by the benchmark[1], it seems the streaming ping-pong operations per second for gRPC in python is around 2k ~ 3k qps. This seems quite low compared to gRPC performance in other languages, e.g. 600k qps for Java and Go. Is it expected to run multiple sdk_wo

Re: Performance of BeamFnData between Python and Java

2018-11-07 Thread Lukasz Cwik

gRPC folks provide a bunch of benchmarks for different scenarios: https://grpc.io/docs/guides/benchmarking.html You would be most interested in the streaming throughput benchmarks since the Data API is written on top of the gRPC streaming APIs. 200KB/s does seem pretty small. Have you captured any

Performance of BeamFnData between Python and Java

2018-11-07 Thread Hai Lu

Hi, This is Hai from LinkedIn. I'm currently working on Portable API for Samza Runner. I was able to make Python work with Samza container reading from Kafka. However, I'm seeing severe performance issue with my set up, achieving only ~200KB throughput between the Samza runner in the Java side and

Re: Performance of BeamFnData between Python and Java

Re: Performance of BeamFnData between Python and Java

Re: Performance of BeamFnData between Python and Java

Re: Performance of BeamFnData between Python and Java

Re: Performance of BeamFnData between Python and Java

Re: Performance of BeamFnData between Python and Java

Re: Performance of BeamFnData between Python and Java

Re: Performance of BeamFnData between Python and Java

Performance of BeamFnData between Python and Java

9 matches

Site Navigation

Mail list logo

Footer information