Does Apache Beam for python support server-based shuffle with Dataflow runner yet?

2018-01-17 Thread Nima Mousavi
Hi,

In June 2017, Google introduced server-based shuffle for Datatflow
pipeline, which can result in 5x performance improvement. However, at the
time of announcement this feature was only available for Cloud Dataflow SDK
for Java version 1. What is the status for Dataflow SDK for Python? Is it
supported already? Any plan to add it soon?


https://cloud.google.com/blog/big-data/2017/06/introducing-
cloud-dataflow-shuffle-for-up-to-5x-performance-improvement-
in-data-analytic-pipelines

Thanks!


Re: Proposal: build Python wheel distributions for Apache Beam releases

2018-02-13 Thread Nima Mousavi
Related question:

How can we tell if the docker image of our binary contains the cython
optimized beam or the slower codepath?
The image was built on Google cloud (using *gcloud container builds submit*
).



On Mon, Feb 12, 2018 at 9:32 PM, Ahmet Altay  wrote:

> +1 to wheels. The main effort for this would be updating the release
> guide, and adding support for other platforms in Jenkins for building and
> testing wheels.  In light of this, maybe we can prioritize having test
> infrastructure for other platforms.
>
> On Mon, Feb 12, 2018 at 1:47 PM, Ismaël Mejía  wrote:
>
>> +1 for wheels, they are the standard binary distribution format so it
>> makes sense. Also wheels support packaging python 2 and 3 on universal
>> packages so they are future proof.
>>
>> On Mon, Feb 12, 2018 at 10:26 PM, Robert Bradshaw 
>> wrote:
>> > +1, is it too late to try to release these as part of the 2.3 release
>> > (to get familiar with the process, no code changes should be needed)?
>>
>
> It would be nice to have this for the current release. How can we build
> and test these binaries? I think it will be prudent to waIt until we have
> infrastructure.
>
>
>> >
>> > The wheels are advantageous when running locally (e.g. during testing
>> > and development) where requiring containers will probably be overkill.
>> > This will become especially relevant with the switch to use the
>> > FnApiRunner.
>> >
>> > On Mon, Feb 12, 2018 at 1:22 PM, Lukasz Cwik  wrote:
>> >> If we want all our code related to pipeline execution to be in a
>> container,
>> >> what value does building wheel distributions provide?
>> >>
>> >>
>> >> On Mon, Feb 12, 2018 at 1:18 PM, Kenneth Knowles 
>> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> On Mon, Feb 12, 2018 at 1:04 PM, Charles Chen  wrote:
>> 
>>  Currently, Apache Beam distributes Python packages through pip and
>> PyPI.
>>  On PyPI, developers can release either source tarballs, and / or
>> precompiled
>>  "wheel" distributions for each platform, which would be used if
>> available
>>  for a particular platform.  Currently, we only distribute the source
>>  tarballs, so any user who installs Beam using "pip install
>> apache_beam" has
>>  to have a compiler and toolchain installed to take advantage of
>> Cython
>>  optimizations in Beam (which require compiled C code).  If such a
>> compiler
>>  is not available, Beam is currently configured to install anyway,
>> but will
>>  use slower Python codepaths instead of the more optimized ones (for
>> example,
>>  for Coder encoding / decoding).
>> 
>>  I would like to propose that we start distributing binary wheel
>>  distributions for our releases, for common platforms like Windows /
>> Mac /
>>  Linux.  We could potentially use a method similar to this one
>>  (https://github.com/MacPython/cython-wheels) for building these
>> wheel
>>  distributions.  Thoughts?
>> 
>>  Best,
>>  Charles
>> >>>
>> >>>
>> >>
>>
>
>