Really appreciate the pointers! Looks like the next step is to try
increasing our bundle size. We will do some experiments on our side and
report back later.
@Robert: thanks a lot for the details on protobuf. It was pretty surprising
to us that decoding protobuf messages slows down the
I'd assume you're compiling the code with Cython as well? (If you're
using the default containers, that should be fine.)
On Fri, Nov 9, 2018 at 12:09 AM Robert Bradshaw wrote:
>
> Very cool to hear of this progress on Samza!
>
> Python protocol buffers are extraordinarily slow (lots of
Very cool to hear of this progress on Samza!
Python protocol buffers are extraordinarily slow (lots of reflection,
attributes lookups, and bit fiddling for serialization/deserialization
that is certainly not Python's strong point). Each bundle processed
involves multiple protos being constructed
We have been doing some end to end testing with Python and Flink
(streaming). You could take a look at the following and possibly replicate
it for your work:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/flink/flink_streaming_impulse.py
We found that in order to get
This benchmark[1] shows that Python is getting about 19mb/s.
Yes, running more python sdk_worker processes will improve performance
since Python is limited to a single CPU core.
[1]
https://performance-dot-grpc-testing.appspot.com/explore?dashboard=5652536396611584=490377658=1286539696
On
By looking at the gRPC dashboard published by the benchmark[1], it seems
the streaming ping-pong operations per second for gRPC in python is around
2k ~ 3k qps. This seems quite low compared to gRPC performance in other
languages, e.g. 600k qps for Java and Go. Is it expected to run multiple
gRPC folks provide a bunch of benchmarks for different scenarios:
https://grpc.io/docs/guides/benchmarking.html
You would be most interested in the streaming throughput benchmarks since
the Data API is written on top of the gRPC streaming APIs.
200KB/s does seem pretty small. Have you captured
Hi,
This is Hai from LinkedIn. I'm currently working on Portable API for Samza
Runner. I was able to make Python work with Samza container reading from
Kafka. However, I'm seeing severe performance issue with my set up,
achieving only ~200KB throughput between the Samza runner in the Java side