Re: Multiple architectures support on Beam (ARM)

2021-06-10 Thread Robert Bradshaw
On Thu, Jun 10, 2021 at 3:00 AM Ismaël Mejía  wrote:
>
> As a follow up on this with the merge of 
> https://github.com/apache/beam/pull/14832 Beam will be producing python 
> wheels for AARCH64 starting on Beam 2.32.0!

Nice.

> Also due to the recent version updates (grpc, protobuf and arrow) we should 
> be pretty close to fully support it without extra compilation.
> Seems like the only missing piece is cython 
> https://github.com/cython/cython/issues/3892

Cython already supports ARM. This is just about providing pre-built
wheels for installing Cython (which aren't necessarily needed).

> Now the next important step would be to make the docker images multi-arch. 
> That would be a great contribution if someone is motivated.
>
>
> On Thu, Jan 28, 2021 at 1:47 AM Robert Bradshaw  wrote:
>>
>> Cython supports ARM64. The issue here is that we don't have a C++ compiler 
>> (It's looking for 'cc') available in the container (and grpc, and possibly 
>> others, don't have wheel files for this platform). I wonder if apt-get 
>> install build-essential would be sufficient.
>>
>> On Wed, Jan 27, 2021 at 2:22 PM Ismaël Mejía  wrote:
>>>
>>> Nice to see the interest, I also suppose that devs on Apple macbooks with 
>>> the
>>> new M1 processor will soon request this feature.
>>>
>>> I ran today some pipelines on ARM64 on classic runners relatively easy
>>> which was expected.  We will have issues however for the Java 8 SDK harness
>>> because the parent image openjdk:8 is not supported yet for ARM64.
>>>
>>> I tried to setup a python dev environment and found the first issue. It 
>>> looks
>>> like gRPC does not support arm64 yet [1][2] or am I misreading it?
>>>
>>> $ pip install -r build-requirements.txt
>>>
>>> Collecting grpcio-tools==1.30.0
>>>   Downloading grpcio-tools-1.30.0.tar.gz (2.1 MB)
>>>  || 2.1 MB 21.7 MB/s
>>> ERROR: Command errored out with exit status 1:
>>>  command: /home/ubuntu/.virtualenvs/beam-dev/bin/python3 -c
>>> 'import sys, setuptools, tokenize; sys.argv[0] =
>>> '"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';
>>> __file__='"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';f=getattr(tokenize,
>>> '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
>>> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
>>> egg_info --egg-base /tmp/pip-pip-egg-info-km8agjf4
>>>  cwd: 
>>> /tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/
>>> Complete output (11 lines):
>>> Traceback (most recent call last):
>>>   File "", line 1, in 
>>>   File 
>>> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>>> line 112, in 
>>> if check_linker_need_libatomic():
>>>   File 
>>> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>>> line 73, in check_linker_need_libatomic
>>> cc_test = subprocess.Popen(['cc', '-x', 'c++', '-std=c++11', '-'],
>>>   File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
>>> self._execute_child(args, executable, preexec_fn, close_fds,
>>>   File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
>>> raise child_exception_type(errno_num, err_msg, err_filename)
>>> FileNotFoundError: [Errno 2] No such file or directory: 'cc'
>>> 
>>> WARNING: Discarding
>>> https://files.pythonhosted.org/packages/da/3c/bed275484f6cc262b5de6ceaae36798c60d7904cdd05dc79cc830b880687/grpcio-tools-1.30.0.tar.gz#sha256=7878adb93b0c1941eb2e0bed60719f38cda2ae5568bc0bcaa701f457e719a329
>>> (from https://pypi.org/simple/grpcio-tools/). Command errored out with
>>> exit status 1: python setup.py egg_info Check the logs for full
>>> command output.
>>> ERROR: Could not find a version that satisfies the requirement
>>> grpcio-tools==1.30.0
>>> ERROR: No matching distribution found for grpcio-tools==1.30.0
>>>
>>> [1] https://pypi.org/project/grpcio-tools/#files
>>> [2] https://github.com/grpc/grpc/issues/21283
>>>
>>> I can imagine also that we will have some struggles with the python harness
>>> and all of its dependencies. Does cython already support ARM64?
>>>
>>> I went and filled some JIRAs to keep track of this:
>>>
>>> BEAM-11703 Support apache-beam python install on ARM64
>>> BEAM-11704 Support Beam docker images on ARM64
>>>
>>>
>>> On Tue, Jan 26, 2021 at 8:48 PM Robert Burke  wrote:
>>> >
>>> > I believe so.
>>> >
>>> > The Go SDK requires in most instances for a user to Register their DoFns 
>>> > at package init time, linked to the type/functions fully qualified path 
>>> > as detemined by Go, which is consistent across architectures, at least 
>>> > with the standard toochain.
>>> >
>>> > Those strings are used to look things up on distributed workers, 
>>> > regardless of the architecture.
>>> >
>>> >
>>> >
>>> > 

Re: Multiple architectures support on Beam (ARM)

2021-06-10 Thread Ismaël Mejía
As a follow up on this with the merge of
https://github.com/apache/beam/pull/14832 Beam will be producing python
wheels for AARCH64 starting on Beam 2.32.0!
Also due to the recent version updates (grpc, protobuf and arrow) we should
be pretty close to fully support it without extra compilation.
Seems like the only missing piece is cython
https://github.com/cython/cython/issues/3892

Now the next important step would be to make the docker images multi-arch.
That would be a great contribution if someone is motivated.


On Thu, Jan 28, 2021 at 1:47 AM Robert Bradshaw  wrote:

> Cython supports ARM64. The issue here is that we don't have a C++ compiler
> (It's looking for 'cc') available in the container (and grpc, and possibly
> others, don't have wheel files for this platform). I wonder if apt-get
> install build-essential would be sufficient.
>
> On Wed, Jan 27, 2021 at 2:22 PM Ismaël Mejía  wrote:
>
>> Nice to see the interest, I also suppose that devs on Apple macbooks with
>> the
>> new M1 processor will soon request this feature.
>>
>> I ran today some pipelines on ARM64 on classic runners relatively easy
>> which was expected.  We will have issues however for the Java 8 SDK
>> harness
>> because the parent image openjdk:8 is not supported yet for ARM64.
>>
>> I tried to setup a python dev environment and found the first issue. It
>> looks
>> like gRPC does not support arm64 yet [1][2] or am I misreading it?
>>
>> $ pip install -r build-requirements.txt
>>
>> Collecting grpcio-tools==1.30.0
>>   Downloading grpcio-tools-1.30.0.tar.gz (2.1 MB)
>>  || 2.1 MB 21.7 MB/s
>> ERROR: Command errored out with exit status 1:
>>  command: /home/ubuntu/.virtualenvs/beam-dev/bin/python3 -c
>> 'import sys, setuptools, tokenize; sys.argv[0] =
>>
>> '"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';
>>
>> __file__='"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';f=getattr(tokenize,
>> '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
>> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
>> egg_info --egg-base /tmp/pip-pip-egg-info-km8agjf4
>>  cwd:
>> /tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/
>> Complete output (11 lines):
>> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File
>> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>> line 112, in 
>> if check_linker_need_libatomic():
>>   File
>> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>> line 73, in check_linker_need_libatomic
>> cc_test = subprocess.Popen(['cc', '-x', 'c++', '-std=c++11', '-'],
>>   File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
>> self._execute_child(args, executable, preexec_fn, close_fds,
>>   File "/usr/lib/python3.8/subprocess.py", line 1702, in
>> _execute_child
>> raise child_exception_type(errno_num, err_msg, err_filename)
>> FileNotFoundError: [Errno 2] No such file or directory: 'cc'
>> 
>> WARNING: Discarding
>>
>> https://files.pythonhosted.org/packages/da/3c/bed275484f6cc262b5de6ceaae36798c60d7904cdd05dc79cc830b880687/grpcio-tools-1.30.0.tar.gz#sha256=7878adb93b0c1941eb2e0bed60719f38cda2ae5568bc0bcaa701f457e719a329
>> (from https://pypi.org/simple/grpcio-tools/). Command errored out with
>> exit status 1: python setup.py egg_info Check the logs for full
>> command output.
>> ERROR: Could not find a version that satisfies the requirement
>> grpcio-tools==1.30.0
>> ERROR: No matching distribution found for grpcio-tools==1.30.0
>>
>> [1] https://pypi.org/project/grpcio-tools/#files
>> [2] https://github.com/grpc/grpc/issues/21283
>>
>> I can imagine also that we will have some struggles with the python
>> harness
>> and all of its dependencies. Does cython already support ARM64?
>>
>> I went and filled some JIRAs to keep track of this:
>>
>> BEAM-11703 Support apache-beam python install on ARM64
>> BEAM-11704 Support Beam docker images on ARM64
>>
>>
>> On Tue, Jan 26, 2021 at 8:48 PM Robert Burke  wrote:
>> >
>> > I believe so.
>> >
>> > The Go SDK requires in most instances for a user to Register their
>> DoFns at package init time, linked to the type/functions fully qualified
>> path as detemined by Go, which is consistent across architectures, at least
>> with the standard toochain.
>> >
>> > Those strings are used to look things up on distributed workers,
>> regardless of the architecture.
>> >
>> >
>> >
>> > On Tue, Jan 26, 2021, 11:33 AM Robert Bradshaw 
>> wrote:
>> >>
>> >> Cool. Are DoFn (et al) references compatible across cross-compiled
>> binaries?
>> >>
>> >> On Tue, Jan 26, 2021 at 11:23 AM Robert Burke 
>> wrote:
>> >>>
>> >>> Go cross compilation is as simple as setting the right flag env
>> variables [1], 

Re: Multiple architectures support on Beam (ARM)

2021-01-27 Thread Robert Bradshaw
Cython supports ARM64. The issue here is that we don't have a C++ compiler
(It's looking for 'cc') available in the container (and grpc, and possibly
others, don't have wheel files for this platform). I wonder if apt-get
install build-essential would be sufficient.

On Wed, Jan 27, 2021 at 2:22 PM Ismaël Mejía  wrote:

> Nice to see the interest, I also suppose that devs on Apple macbooks with
> the
> new M1 processor will soon request this feature.
>
> I ran today some pipelines on ARM64 on classic runners relatively easy
> which was expected.  We will have issues however for the Java 8 SDK harness
> because the parent image openjdk:8 is not supported yet for ARM64.
>
> I tried to setup a python dev environment and found the first issue. It
> looks
> like gRPC does not support arm64 yet [1][2] or am I misreading it?
>
> $ pip install -r build-requirements.txt
>
> Collecting grpcio-tools==1.30.0
>   Downloading grpcio-tools-1.30.0.tar.gz (2.1 MB)
>  || 2.1 MB 21.7 MB/s
> ERROR: Command errored out with exit status 1:
>  command: /home/ubuntu/.virtualenvs/beam-dev/bin/python3 -c
> 'import sys, setuptools, tokenize; sys.argv[0] =
>
> '"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';
>
> __file__='"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';f=getattr(tokenize,
> '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
> egg_info --egg-base /tmp/pip-pip-egg-info-km8agjf4
>  cwd:
> /tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/
> Complete output (11 lines):
> Traceback (most recent call last):
>   File "", line 1, in 
>   File
> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
> line 112, in 
> if check_linker_need_libatomic():
>   File
> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
> line 73, in check_linker_need_libatomic
> cc_test = subprocess.Popen(['cc', '-x', 'c++', '-std=c++11', '-'],
>   File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
> self._execute_child(args, executable, preexec_fn, close_fds,
>   File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
> raise child_exception_type(errno_num, err_msg, err_filename)
> FileNotFoundError: [Errno 2] No such file or directory: 'cc'
> 
> WARNING: Discarding
>
> https://files.pythonhosted.org/packages/da/3c/bed275484f6cc262b5de6ceaae36798c60d7904cdd05dc79cc830b880687/grpcio-tools-1.30.0.tar.gz#sha256=7878adb93b0c1941eb2e0bed60719f38cda2ae5568bc0bcaa701f457e719a329
> (from https://pypi.org/simple/grpcio-tools/). Command errored out with
> exit status 1: python setup.py egg_info Check the logs for full
> command output.
> ERROR: Could not find a version that satisfies the requirement
> grpcio-tools==1.30.0
> ERROR: No matching distribution found for grpcio-tools==1.30.0
>
> [1] https://pypi.org/project/grpcio-tools/#files
> [2] https://github.com/grpc/grpc/issues/21283
>
> I can imagine also that we will have some struggles with the python harness
> and all of its dependencies. Does cython already support ARM64?
>
> I went and filled some JIRAs to keep track of this:
>
> BEAM-11703 Support apache-beam python install on ARM64
> BEAM-11704 Support Beam docker images on ARM64
>
>
> On Tue, Jan 26, 2021 at 8:48 PM Robert Burke  wrote:
> >
> > I believe so.
> >
> > The Go SDK requires in most instances for a user to Register their DoFns
> at package init time, linked to the type/functions fully qualified path as
> detemined by Go, which is consistent across architectures, at least with
> the standard toochain.
> >
> > Those strings are used to look things up on distributed workers,
> regardless of the architecture.
> >
> >
> >
> > On Tue, Jan 26, 2021, 11:33 AM Robert Bradshaw 
> wrote:
> >>
> >> Cool. Are DoFn (et al) references compatible across cross-compiled
> binaries?
> >>
> >> On Tue, Jan 26, 2021 at 11:23 AM Robert Burke 
> wrote:
> >>>
> >>> Go cross compilation is as simple as setting the right flag env
> variables [1], but can be as complicated as requiring a cross compiling GCC
> instance installed if CGO[2] is necessary. I think we're probably clear on
> just needing the flag though for the various Boot executables.
> >>>
> >>> For go pipelines we'd need to update the shared runner code to support
> selecting the cross compiled worker binary environment. I believe it's hard
> set to amd64 linux at present, but that's a separate issue.
> >>>
> >>> [1] https://golangcookbook.com/chapters/running/cross-compiling/
> >>> [2] https://golang.org/cmd/cgo/
> >>>
> >>> On Tue, Jan 26, 2021, 10:25 AM Robert Bradshaw 
> wrote:
> 
>  +1
> 
>  I don't think it would be that hard to build and 

Re: Multiple architectures support on Beam (ARM)

2021-01-27 Thread Ismaël Mejía
Nice to see the interest, I also suppose that devs on Apple macbooks with the
new M1 processor will soon request this feature.

I ran today some pipelines on ARM64 on classic runners relatively easy
which was expected.  We will have issues however for the Java 8 SDK harness
because the parent image openjdk:8 is not supported yet for ARM64.

I tried to setup a python dev environment and found the first issue. It looks
like gRPC does not support arm64 yet [1][2] or am I misreading it?

$ pip install -r build-requirements.txt

Collecting grpcio-tools==1.30.0
  Downloading grpcio-tools-1.30.0.tar.gz (2.1 MB)
 || 2.1 MB 21.7 MB/s
ERROR: Command errored out with exit status 1:
 command: /home/ubuntu/.virtualenvs/beam-dev/bin/python3 -c
'import sys, setuptools, tokenize; sys.argv[0] =
'"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';
__file__='"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';f=getattr(tokenize,
'"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
egg_info --egg-base /tmp/pip-pip-egg-info-km8agjf4
 cwd: 
/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/
Complete output (11 lines):
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
line 112, in 
if check_linker_need_libatomic():
  File 
"/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
line 73, in check_linker_need_libatomic
cc_test = subprocess.Popen(['cc', '-x', 'c++', '-std=c++11', '-'],
  File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'cc'

WARNING: Discarding
https://files.pythonhosted.org/packages/da/3c/bed275484f6cc262b5de6ceaae36798c60d7904cdd05dc79cc830b880687/grpcio-tools-1.30.0.tar.gz#sha256=7878adb93b0c1941eb2e0bed60719f38cda2ae5568bc0bcaa701f457e719a329
(from https://pypi.org/simple/grpcio-tools/). Command errored out with
exit status 1: python setup.py egg_info Check the logs for full
command output.
ERROR: Could not find a version that satisfies the requirement
grpcio-tools==1.30.0
ERROR: No matching distribution found for grpcio-tools==1.30.0

[1] https://pypi.org/project/grpcio-tools/#files
[2] https://github.com/grpc/grpc/issues/21283

I can imagine also that we will have some struggles with the python harness
and all of its dependencies. Does cython already support ARM64?

I went and filled some JIRAs to keep track of this:

BEAM-11703 Support apache-beam python install on ARM64
BEAM-11704 Support Beam docker images on ARM64


On Tue, Jan 26, 2021 at 8:48 PM Robert Burke  wrote:
>
> I believe so.
>
> The Go SDK requires in most instances for a user to Register their DoFns at 
> package init time, linked to the type/functions fully qualified path as 
> detemined by Go, which is consistent across architectures, at least with the 
> standard toochain.
>
> Those strings are used to look things up on distributed workers, regardless 
> of the architecture.
>
>
>
> On Tue, Jan 26, 2021, 11:33 AM Robert Bradshaw  wrote:
>>
>> Cool. Are DoFn (et al) references compatible across cross-compiled binaries?
>>
>> On Tue, Jan 26, 2021 at 11:23 AM Robert Burke  wrote:
>>>
>>> Go cross compilation is as simple as setting the right flag env variables 
>>> [1], but can be as complicated as requiring a cross compiling GCC instance 
>>> installed if CGO[2] is necessary. I think we're probably clear on just 
>>> needing the flag though for the various Boot executables.
>>>
>>> For go pipelines we'd need to update the shared runner code to support 
>>> selecting the cross compiled worker binary environment. I believe it's hard 
>>> set to amd64 linux at present, but that's a separate issue.
>>>
>>> [1] https://golangcookbook.com/chapters/running/cross-compiling/
>>> [2] https://golang.org/cmd/cgo/
>>>
>>> On Tue, Jan 26, 2021, 10:25 AM Robert Bradshaw  wrote:

 +1

 I don't think it would be that hard to build and release arm-based docker 
 images. (Perhaps just a matter of changing the docker file to depend on a 
 different base, and doing some cross-compile. That would suss out whether 
 we're inadvertently taking on any incompatible dependencies.)

 Theoretically, if one does that and manually specifies the container, it 
 could just work for Python (assuming no wheel files are specified as 
 manual dependencies). For Java, if one builds/deploys an uberjar (on a 

Re: Multiple architectures support on Beam (ARM)

2021-01-26 Thread Robert Burke
I believe so.

The Go SDK requires in most instances for a user to Register their DoFns at
package init time, linked to the type/functions fully qualified path as
detemined by Go, which is consistent across architectures, at least with
the standard toochain.

Those strings are used to look things up on distributed workers, regardless
of the architecture.



On Tue, Jan 26, 2021, 11:33 AM Robert Bradshaw  wrote:

> Cool. Are DoFn (et al) references compatible across cross-compiled
> binaries?
>
> On Tue, Jan 26, 2021 at 11:23 AM Robert Burke  wrote:
>
>> Go cross compilation is as simple as setting the right flag env variables
>> [1], but can be as complicated as requiring a cross compiling GCC instance
>> installed if CGO[2] is necessary. I think we're probably clear on just
>> needing the flag though for the various Boot executables.
>>
>> For go pipelines we'd need to update the shared runner code to support
>> selecting the cross compiled worker binary environment. I believe it's hard
>> set to amd64 linux at present, but that's a separate issue.
>>
>> [1] https://golangcookbook.com/chapters/running/cross-compiling/
>> [2] https://golang.org/cmd/cgo/
>>
>> On Tue, Jan 26, 2021, 10:25 AM Robert Bradshaw 
>> wrote:
>>
>>> +1
>>>
>>> I don't think it would be that hard to build and release arm-based
>>> docker images. (Perhaps just a matter of changing the docker file to depend
>>> on a different base, and doing some cross-compile. That would suss out
>>> whether we're inadvertently taking on any incompatible dependencies.)
>>>
>>> Theoretically, if one does that and manually specifies the container, it
>>> could just work for Python (assuming no wheel files are specified as manual
>>> dependencies). For Java, if one builds/deploys an uberjar (on a different
>>> architecture), there may be issues in any transitive dependency that has
>>> JNI code (us or users). I'd imagine this issue is common to and being
>>> explored by many of the other Java big data systems in use; it'd be
>>> interesting to know what solutions are out there.
>>>
>>> For go, the executable is uploaded directly into the container. We'd
>>> probably have to do something fancier like cross-compiling the executable
>>> (and making sure the UserFn references, which I think are just pointers
>>> into the binary, still work if the launcher is one architecture and the
>>> workers another).
>>>
>>> Definitely worth exploring.
>>>
>>>
>>>
>>>
>>> On Tue, Jan 26, 2021 at 10:09 AM Ismaël Mejía  wrote:
>>>
 I stumbled today on this user request:
 BEAM-10982 Wheel support for linux aarch64

 It made me wonder if with the advent of ARM64 processors not only in
 the client but server side (Graviton and others) if it is worth that
 we start to think about having support for this architecture on the
 python installers and in the docker images. It seems that for the
 latter it should not be that difficult given that our parent images
 are already multi-arch.

 Are there some possible issues or binary/platform specific
 dependencies that impede us from doing this?

>>>


Re: Multiple architectures support on Beam (ARM)

2021-01-26 Thread Robert Bradshaw
Cool. Are DoFn (et al) references compatible across cross-compiled
binaries?

On Tue, Jan 26, 2021 at 11:23 AM Robert Burke  wrote:

> Go cross compilation is as simple as setting the right flag env variables
> [1], but can be as complicated as requiring a cross compiling GCC instance
> installed if CGO[2] is necessary. I think we're probably clear on just
> needing the flag though for the various Boot executables.
>
> For go pipelines we'd need to update the shared runner code to support
> selecting the cross compiled worker binary environment. I believe it's hard
> set to amd64 linux at present, but that's a separate issue.
>
> [1] https://golangcookbook.com/chapters/running/cross-compiling/
> [2] https://golang.org/cmd/cgo/
>
> On Tue, Jan 26, 2021, 10:25 AM Robert Bradshaw 
> wrote:
>
>> +1
>>
>> I don't think it would be that hard to build and release arm-based docker
>> images. (Perhaps just a matter of changing the docker file to depend on a
>> different base, and doing some cross-compile. That would suss out whether
>> we're inadvertently taking on any incompatible dependencies.)
>>
>> Theoretically, if one does that and manually specifies the container, it
>> could just work for Python (assuming no wheel files are specified as manual
>> dependencies). For Java, if one builds/deploys an uberjar (on a different
>> architecture), there may be issues in any transitive dependency that has
>> JNI code (us or users). I'd imagine this issue is common to and being
>> explored by many of the other Java big data systems in use; it'd be
>> interesting to know what solutions are out there.
>>
>> For go, the executable is uploaded directly into the container. We'd
>> probably have to do something fancier like cross-compiling the executable
>> (and making sure the UserFn references, which I think are just pointers
>> into the binary, still work if the launcher is one architecture and the
>> workers another).
>>
>> Definitely worth exploring.
>>
>>
>>
>>
>> On Tue, Jan 26, 2021 at 10:09 AM Ismaël Mejía  wrote:
>>
>>> I stumbled today on this user request:
>>> BEAM-10982 Wheel support for linux aarch64
>>>
>>> It made me wonder if with the advent of ARM64 processors not only in
>>> the client but server side (Graviton and others) if it is worth that
>>> we start to think about having support for this architecture on the
>>> python installers and in the docker images. It seems that for the
>>> latter it should not be that difficult given that our parent images
>>> are already multi-arch.
>>>
>>> Are there some possible issues or binary/platform specific
>>> dependencies that impede us from doing this?
>>>
>>


Re: Multiple architectures support on Beam (ARM)

2021-01-26 Thread Robert Burke
Go cross compilation is as simple as setting the right flag env variables
[1], but can be as complicated as requiring a cross compiling GCC instance
installed if CGO[2] is necessary. I think we're probably clear on just
needing the flag though for the various Boot executables.

For go pipelines we'd need to update the shared runner code to support
selecting the cross compiled worker binary environment. I believe it's hard
set to amd64 linux at present, but that's a separate issue.

[1] https://golangcookbook.com/chapters/running/cross-compiling/
[2] https://golang.org/cmd/cgo/

On Tue, Jan 26, 2021, 10:25 AM Robert Bradshaw  wrote:

> +1
>
> I don't think it would be that hard to build and release arm-based docker
> images. (Perhaps just a matter of changing the docker file to depend on a
> different base, and doing some cross-compile. That would suss out whether
> we're inadvertently taking on any incompatible dependencies.)
>
> Theoretically, if one does that and manually specifies the container, it
> could just work for Python (assuming no wheel files are specified as manual
> dependencies). For Java, if one builds/deploys an uberjar (on a different
> architecture), there may be issues in any transitive dependency that has
> JNI code (us or users). I'd imagine this issue is common to and being
> explored by many of the other Java big data systems in use; it'd be
> interesting to know what solutions are out there.
>
> For go, the executable is uploaded directly into the container. We'd
> probably have to do something fancier like cross-compiling the executable
> (and making sure the UserFn references, which I think are just pointers
> into the binary, still work if the launcher is one architecture and the
> workers another).
>
> Definitely worth exploring.
>
>
>
>
> On Tue, Jan 26, 2021 at 10:09 AM Ismaël Mejía  wrote:
>
>> I stumbled today on this user request:
>> BEAM-10982 Wheel support for linux aarch64
>>
>> It made me wonder if with the advent of ARM64 processors not only in
>> the client but server side (Graviton and others) if it is worth that
>> we start to think about having support for this architecture on the
>> python installers and in the docker images. It seems that for the
>> latter it should not be that difficult given that our parent images
>> are already multi-arch.
>>
>> Are there some possible issues or binary/platform specific
>> dependencies that impede us from doing this?
>>
>


Re: Multiple architectures support on Beam (ARM)

2021-01-26 Thread Robert Bradshaw
+1

I don't think it would be that hard to build and release arm-based docker
images. (Perhaps just a matter of changing the docker file to depend on a
different base, and doing some cross-compile. That would suss out whether
we're inadvertently taking on any incompatible dependencies.)

Theoretically, if one does that and manually specifies the container, it
could just work for Python (assuming no wheel files are specified as manual
dependencies). For Java, if one builds/deploys an uberjar (on a different
architecture), there may be issues in any transitive dependency that has
JNI code (us or users). I'd imagine this issue is common to and being
explored by many of the other Java big data systems in use; it'd be
interesting to know what solutions are out there.

For go, the executable is uploaded directly into the container. We'd
probably have to do something fancier like cross-compiling the executable
(and making sure the UserFn references, which I think are just pointers
into the binary, still work if the launcher is one architecture and the
workers another).

Definitely worth exploring.




On Tue, Jan 26, 2021 at 10:09 AM Ismaël Mejía  wrote:

> I stumbled today on this user request:
> BEAM-10982 Wheel support for linux aarch64
>
> It made me wonder if with the advent of ARM64 processors not only in
> the client but server side (Graviton and others) if it is worth that
> we start to think about having support for this architecture on the
> python installers and in the docker images. It seems that for the
> latter it should not be that difficult given that our parent images
> are already multi-arch.
>
> Are there some possible issues or binary/platform specific
> dependencies that impede us from doing this?
>


Multiple architectures support on Beam (ARM)

2021-01-26 Thread Ismaël Mejía
I stumbled today on this user request:
BEAM-10982 Wheel support for linux aarch64

It made me wonder if with the advent of ARM64 processors not only in
the client but server side (Graviton and others) if it is worth that
we start to think about having support for this architecture on the
python installers and in the docker images. It seems that for the
latter it should not be that difficult given that our parent images
are already multi-arch.

Are there some possible issues or binary/platform specific
dependencies that impede us from doing this?