Re: contributor permission for Beam Jira tickets

2021-06-10 Thread Ismaël Mejía
Hello Pascal,

I added you as a contributor  so you can now self assign issues if you
want. I assigned BEAM-12471 to you since I saw you opened a PR to fix it.

Best,
Ismaël


On Wed, Jun 9, 2021 at 11:05 PM Pascal Gillet 
wrote:

> Hi,
>
> This is Pascal. I identified some little but nonetheless annoying bugs in
> Beam. Can someone add me as a contributor for Beam's Jira issue
> tracker? I would like to assign tickets to myself.
>
> My JIRA login: pgillet
>
>
> Thanks,
> Pascal
>


Re: Multiple architectures support on Beam (ARM)

2021-06-10 Thread Ismaël Mejía
As a follow up on this with the merge of
https://github.com/apache/beam/pull/14832 Beam will be producing python
wheels for AARCH64 starting on Beam 2.32.0!
Also due to the recent version updates (grpc, protobuf and arrow) we should
be pretty close to fully support it without extra compilation.
Seems like the only missing piece is cython
https://github.com/cython/cython/issues/3892

Now the next important step would be to make the docker images multi-arch.
That would be a great contribution if someone is motivated.


On Thu, Jan 28, 2021 at 1:47 AM Robert Bradshaw  wrote:

> Cython supports ARM64. The issue here is that we don't have a C++ compiler
> (It's looking for 'cc') available in the container (and grpc, and possibly
> others, don't have wheel files for this platform). I wonder if apt-get
> install build-essential would be sufficient.
>
> On Wed, Jan 27, 2021 at 2:22 PM Ismaël Mejía  wrote:
>
>> Nice to see the interest, I also suppose that devs on Apple macbooks with
>> the
>> new M1 processor will soon request this feature.
>>
>> I ran today some pipelines on ARM64 on classic runners relatively easy
>> which was expected.  We will have issues however for the Java 8 SDK
>> harness
>> because the parent image openjdk:8 is not supported yet for ARM64.
>>
>> I tried to setup a python dev environment and found the first issue. It
>> looks
>> like gRPC does not support arm64 yet [1][2] or am I misreading it?
>>
>> $ pip install -r build-requirements.txt
>>
>> Collecting grpcio-tools==1.30.0
>>   Downloading grpcio-tools-1.30.0.tar.gz (2.1 MB)
>>  || 2.1 MB 21.7 MB/s
>> ERROR: Command errored out with exit status 1:
>>  command: /home/ubuntu/.virtualenvs/beam-dev/bin/python3 -c
>> 'import sys, setuptools, tokenize; sys.argv[0] =
>>
>> '"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';
>>
>> __file__='"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';f=getattr(tokenize,
>> '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
>> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
>> egg_info --egg-base /tmp/pip-pip-egg-info-km8agjf4
>>  cwd:
>> /tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/
>> Complete output (11 lines):
>> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File
>> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>> line 112, in 
>> if check_linker_need_libatomic():
>>   File
>> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>> line 73, in check_linker_need_libatomic
>> cc_test = subprocess.Popen(['cc', '-x', 'c++', '-std=c++11', '-'],
>>   File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
>> self._execute_child(args, executable, preexec_fn, close_fds,
>>   File "/usr/lib/python3.8/subprocess.py", line 1702, in
>> _execute_child
>> raise child_exception_type(errno_num, err_msg, err_filename)
>> FileNotFoundError: [Errno 2] No such file or directory: 'cc'
>> 
>> WARNING: Discarding
>>
>> https://files.pythonhosted.org/packages/da/3c/bed275484f6cc262b5de6ceaae36798c60d7904cdd05dc79cc830b880687/grpcio-tools-1.30.0.tar.gz#sha256=7878adb93b0c1941eb2e0bed60719f38cda2ae5568bc0bcaa701f457e719a329
>> (from https://pypi.org/simple/grpcio-tools/). Command errored out with
>> exit status 1: python setup.py egg_info Check the logs for full
>> command output.
>> ERROR: Could not find a version that satisfies the requirement
>> grpcio-tools==1.30.0
>> ERROR: No matching distribution found for grpcio-tools==1.30.0
>>
>> [1] https://pypi.org/project/grpcio-tools/#files
>> [2] https://github.com/grpc/grpc/issues/21283
>>
>> I can imagine also that we will have some struggles with the python
>> harness
>> and all of its dependencies. Does cython already support ARM64?
>>
>> I went and filled some JIRAs to keep track of this:
>>
>> BEAM-11703 Support apache-beam python install on ARM64
>> BEAM-11704 Support Beam docker images on ARM64
>>
>>
>> On Tue, Jan 26, 2021 at 8:48 PM Robert Burke  wrote:
>> >
>> > I believe so.
>> >
>> > The Go SDK requires in most instances for a user to Register their
>> DoFns at package init time, linked to the type/functions fully qualified
>> path as detemined by Go, which is consistent across architectures, at least
>> with the standard toochain.
>> >
>> > Those strings are used to look things up on distributed workers,
>> regardless of the architecture.
>> >
>> >
>> >
>> > On Tue, Jan 26, 2021, 11:33 AM Robert Bradshaw 
>> wrote:
>> >>
>> >> Cool. Are DoFn (et al) references compatible across cross-compiled
>> binaries?
>> >>
>> >> On Tue, Jan 26, 2021 at 11:23 AM Robert Burke 
>> wrote:
>> >>>
>> >>> Go cross compilation is as simple as setting the right flag env
>> variables [1], 

Re: [Proposal] Enable Branch Protection for `release-.*` branches

2021-06-10 Thread Robert Burke
Ok. I'll fill a ticket with Infra tomorrow once the objection period has
passed. Thanks!

On Wed, Jun 9, 2021, 12:51 PM Kenneth Knowles  wrote:

> Great idea. I think only infra can do it.
>
> On Tue, Jun 8, 2021, 14:27 Robert Burke  wrote:
>
>> During the last branch cleanup, it appears I accidentally deleted the
>> release-2.26.0 branch.
>>
>> Lukasz Cwik pointed this out to me, and fortunately I was able to recover
>> and push it back to the repo. However, Brian Hulette then pointed out
>> github lets us set up rules to avoid accidental deletion of branches that
>> match certain names.
>>
>>
>> https://docs.github.com/en/github/administering-a-repository/defining-the-mergeability-of-pull-requests/managing-a-branch-protection-rule
>>
>> So the proposal is we guard our release branches from accidental deletion
>> using this mechanism. A repo admin (which I think is probably a PMC
>> member?) can do so by following the linked instructions.
>>
>> My only note is that setting up such a rule sets the protection by
>> default, and there shouldn't be any need to select other options (except at
>> the PMC's discretion).
>>
>> Unless there's an objection (in the traditional ~3 day period), could a
>> volunteer PMC set up such a protection rule, and prevent my error from
>> recurring?
>>
>> Cheers
>> Robert Burke
>> Beam Go Busybody.
>>
>>


Re: [EXTERNAL]

2021-06-10 Thread Alexey Romanenko

> On 9 Jun 2021, at 21:34, Raphael Sanamyan  
> wrote:
> 
> In this case, we write data to one table and then to the other, but only 
> after the window of data has been fully written to the first table. It is not 
> possible to do this with the existing JdbcIO.Write functionality.

Well, it’s kind of possible but in this case we need to set a statement. I 
guess, it can be fixed by generating it automatically from input schema.

> Another option for this specific case could be extending the existing class 
> instead of adding a schemaApi-specific class. We can add additional 
> conditions and move some functionality from Write to WriteVoid to infer 
> beamScheama. What do you think about these options?

Not sure that I got it. Could you elaborate a bit on this?  

Is it somehow related to this work [1]? 

> Schema Providers is not very well documented in Beam, and a bit confusing us. 
> We using Beam row as a common abstraction in Beam pipelines, which really 
> meets our requirements. Looking to Beam docs/code we saw SchemaProviders for 
> some IOs. Those providers seem like wrappers around IOs that help work with 
> schemas and conversion data to Beam Rows. Сould you please clarify this a 
> little? If we want to improve Beam Schema API what is the architecture-right 
> way to do that?

Well, it depends what do you want improve - Schema API in general or some 
specific IO schema related things. We need to be careful with breaking changes. 
Anyway, it would be great to bring it to this mailing list as a design doc in 
some way and discuss with other people before starting an implementation.

—
Alexey


[1] https://github.com/apache/beam/pull/14856



> 
> Thank you,
> Raphael.
> От: Brian Hulette 
> Отправлено: 9 июня 2021 г. 19:12:41
> Кому: dev
> Копия: Reuven Lax; pabl...@google.com; Ilya Kozyrev
> Тема: [EXTERNAL] Re:
>  
> > And also the ticket and "// TODO: BEAM-10396 use writeRows() when it's 
> > available" appeared later than this functionality was added to 
> > "JdbcIO.Write".
> 
> Note that this TODO has been moved around through a few refactors. It was 
> initially added last summer [1].
> You're right that JdbcIO.Write's statement generation functionality was added 
> about a year before that [2]. It's possible that the author of [1] didn't 
> realize [2] was done. Or maybe there's some reason why it doesn't work there?
> 
> +1 for Alexey's requests:
> - Identify cases where statement generation in JdbcIO.Write is insufficient, 
> if they exist (e.g. can we just use it where that TODO is [3]? If not what 
> goes wrong?).
> - Update documentation to avoid this confusion in the future.
> 
> Brian
> 
> [1] https://github.com/apache/beam/pull/12145 
> 
> [2] https://github.com/apache/beam/pull/8962 
> 
> [3] https://github.com/apache/beam/pull/14954#discussion_r648456230 
> 
> On Wed, Jun 9, 2021 at 7:49 AM Alexey Romanenko  > wrote:
> Hello Raphael,
> 
>> On 9 Jun 2021, at 09:31, Raphael Sanamyan > > wrote:
>> 
>> The "JdbcIO.Write" allows you to write rows without a statement or statement 
>> preparer, but not all functionality works without them.
> 
> Could you show a use case when the current functionality is not enough? 
> 
> 
>> The method "WithResults" requires a statement and statement preparer. And 
>> also the ticket  and "// 
>> TODO: BEAM-10396 use writeRows() when it's available" 
>> 
>>  appeared later than this functionality was added to "JdbcIO.Write". And 
>> without reading the code, just the documentation, it's not clear that the 
>> schema is enough.
> 
> Agree but the documentation can be updated. On the oath hand, it would be 
> great to have some examples that show the needs of WriteRows.
> 
> Thanks,
> Alexey
> 
>> Thank you,
>> Raphael.
>> 
>> 
>> 
>> От: Pablo Estrada mailto:pabl...@google.com>>
>> Отправлено: 7 июня 2021 г. 22:43:24
>> Кому: dev; Reuven Lax
>> Копия: Ilya Kozyrev
>> Тема: Re:
>>  
>> *** This Message Is From an External Sender ***
>> +Reuven Lax  do you know if this is already 
>> supported or not?
>> I have been able to use `JdbcIO.write()` without specifying a statement nor 
>> a statement preparer. Is that not what's necessary? I've done this with a 
>> named class with schemas (i.e. not Row) - is this perhaps the difference?
>> Best
>> -P.
>> 
>> On Fri, Jun 4, 2021 at 3:44 PM Robert Bradshaw > > wrote:
>> That would be great! I don't know much about this particular issue,
>> but tips for getting started in general can be found at
>> https://beam.apache.org/contribute/ 

P0 (outage) report

2021-06-10 Thread Beam Jira Bot
This is your daily summary of Beam's current outages. See 
https://beam.apache.org/contribute/jira-priorities/#p0-outage for the meaning 
and expectations around P0 issues.

BEAM-12467: java.io.InvalidClassException With Flink Kafka 
(https://issues.apache.org/jira/browse/BEAM-12467)


P1 issues report (38)

2021-06-10 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-12443: No Information of failed 
Query in JdbcIO (created 2021-06-02)
https://issues.apache.org/jira/browse/BEAM-12436: 
[beam_PostCommit_Go_VR_flink| beam_PostCommit_Go_VR_spark] 
[:sdks:go:test:flinkValidatesRunner] Failure summary (created 2021-06-01)
https://issues.apache.org/jira/browse/BEAM-12422: Vendored gRPC 1.36.0 is 
using a log4j version with security issues (created 2021-05-28)
https://issues.apache.org/jira/browse/BEAM-12396: 
beam_PostCommit_XVR_Direct failed (flaked?) (created 2021-05-24)
https://issues.apache.org/jira/browse/BEAM-12389: 
beam_PostCommit_XVR_Dataflow flaky: Expand method not found (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12387: beam_PostCommit_Python* 
timing out (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12386: 
beam_PostCommit_Py_VR_Dataflow(_V2) failing metrics tests (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12380: Go SDK Kafka IO Transform 
implemented via XLang (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12374: Spark postcommit failing 
ResumeFromCheckpointStreamingTest (created 2021-05-20)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12310: 
beam_PostCommit_Java_DataflowV2 failing (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12279: Implement 
destination-dependent sharding in FileIO.writeDynamic (created 2021-05-04)
https://issues.apache.org/jira/browse/BEAM-12256: 
PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some 
Avro logical types (created 2021-04-29)
https://issues.apache.org/jira/browse/BEAM-11959: Python Beam SDK Harness 
hangs when installing pip packages (created 2021-03-11)
https://issues.apache.org/jira/browse/BEAM-11906: No trigger early 
repeatedly for session windows (created 2021-03-01)
https://issues.apache.org/jira/browse/BEAM-11875: XmlIO.Read does not 
handle XML encoding per spec (created 2021-02-26)
https://issues.apache.org/jira/browse/BEAM-11828: JmsIO is not 
acknowledging messages correctly (created 2021-02-17)
https://issues.apache.org/jira/browse/BEAM-11755: Cross-language 
consistency (RequiresStableInputs) is quietly broken (at least on portable 
flink runner) (created 2021-02-05)
https://issues.apache.org/jira/browse/BEAM-11578: `dataflow_metrics` 
(python) fails with TypeError (when int overflowing?) (created 2021-01-06)
https://issues.apache.org/jira/browse/BEAM-11434: Expose Spanner 
admin/batch clients in Spanner Accessor (created 2020-12-10)
https://issues.apache.org/jira/browse/BEAM-11148: Kafka 
commitOffsetsInFinalize OOM on Flink (created 2020-10-28)
https://issues.apache.org/jira/browse/BEAM-11017: Timer with dataflow 
runner can be set multiple times (dataflow runner) (created 2020-10-05)
https://issues.apache.org/jira/browse/BEAM-10670: Make non-portable 
Splittable DoFn the only option when executing Java "Read" transforms (created 
2020-08-10)
https://issues.apache.org/jira/browse/BEAM-10617: python 
CombineGlobally().with_fanout() cause duplicate combine results for sliding 
windows (created 2020-07-31)
https://issues.apache.org/jira/browse/BEAM-10569: SpannerIO tests don't 
actually assert anything. (created 2020-07-23)
https://issues.apache.org/jira/browse/BEAM-10288: Quickstart documents are 
out of date (created 2020-06-19)
https://issues.apache.org/jira/browse/BEAM-10244: Populate requirements 
cache fails on poetry-based packages (created 2020-06-11)
https://issues.apache.org/jira/browse/BEAM-10100: FileIO writeDynamic with 
AvroIO.sink not writing all data (created 2020-05-27)
https://issues.apache.org/jira/browse/BEAM-9564: Remove insecure ssl 
options from MongoDBIO (created 2020-03-20)
https://issues.apache.org/jira/browse/BEAM-9455: Environment-sensitive 
provisioning for Dataflow (created 2020-03-05)
https://issues.apache.org/jira/browse/BEAM-9293: Python direct runner 
doesn't emit empty pane when it should (created 2020-02-11)
https://issues.apache.org/jira/browse/BEAM-8986: SortValues may not work 
correct for numerical types (created 2019-12-17)
https://issues.apache.org/jira/browse/BEAM-8985: SortValues should fail if 
SecondaryKey coder is not deterministic (created 2019-12-17)
https://issues.apache.org/jira/browse/BEAM-8407: [SQL] Some Hive tests 
throw NullPoint

Flaky test issue report (37)

2021-06-10 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-12322: 
FnApiRunnerTestWithGrpcAndMultiWorkers flaky (py precommit) (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12309: 
PubSubIntegrationTest.test_streaming_data_only flake (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12307: 
PubSubBigQueryIT.test_file_loads flake (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12303: Flake in 
PubSubIntegrationTest.test_streaming_with_attributes (created 2021-05-06)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12200: 
SamzaStoreStateInternalsTest is flaky (created 2021-04-20)
https://issues.apache.org/jira/browse/BEAM-12163: Python GHA PreCommits 
flake with grpc.FutureTimeoutError on SDK harness startup (created 2021-04-13)
https://issues.apache.org/jira/browse/BEAM-12061: beam_PostCommit_SQL 
failing on KafkaTableProviderIT.testFakeNested (created 2021-03-27)
https://issues.apache.org/jira/browse/BEAM-12019: 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky (created 2021-03-18)
https://issues.apache.org/jira/browse/BEAM-11792: Python precommit failed 
(flaked?) installing package  (created 2021-02-10)
https://issues.apache.org/jira/browse/BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (created 2021-01-20)
https://issues.apache.org/jira/browse/BEAM-11661: hdfsIntegrationTest 
flake: network not found (py38 postcommit) (created 2021-01-19)
https://issues.apache.org/jira/browse/BEAM-11645: beam_PostCommit_XVR_Flink 
failing (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11541: 
testTeardownCalledAfterExceptionInProcessElement flakes on direct runner. 
(created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-11540: Linter sometimes flakes 
on apache_beam.dataframe.frames_test (created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-10995: Java + Universal Local 
Runner: WindowingTest.testWindowPreservation fails (created 2020-09-30)
https://issues.apache.org/jira/browse/BEAM-10987: 
stager_test.py::StagerTest::test_with_main_session flaky on windows py3.6,3.7 
(created 2020-09-29)
https://issues.apache.org/jira/browse/BEAM-10968: flaky test: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (created 2020-09-25)
https://issues.apache.org/jira/browse/BEAM-10955: Flink Java Runner test 
flake: Could not find Flink job  (created 2020-09-23)
https://issues.apache.org/jira/browse/BEAM-10866: 
PortableRunnerTestWithSubprocesses.test_register_finalizations flaky on macOS 
(created 2020-09-09)
https://issues.apache.org/jira/browse/BEAM-10501: 
CheckGrafanaStalenessAlerts and PingGrafanaHttpApi fail with Connection refused 
(created 2020-07-15)
https://issues.apache.org/jira/browse/BEAM-10485: Failure / flake: 
ElasticsearchIOTest > testWriteWithIndexFn (created 2020-07-14)
https://issues.apache.org/jira/browse/BEAM-9649: 
beam_python_mongoio_load_test started failing due to mismatched results 
(created 2020-03-31)
https://issues.apache.org/jira/browse/BEAM-9392: TestStream tests are all 
flaky (created 2020-02-27)
https://issues.apache.org/jira/browse/BEAM-9232: 
BigQueryWriteIntegrationTests is flaky coercing to Unicode (created 2020-01-31)
https://issues.apache.org/jira/browse/BEAM-9119: 
apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest[...].test_large_elements
 is flaky (created 2020-01-14)
https://issues.apache.org/jira/browse/BEAM-8101: Flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful for 
Direct, Spark, Flink (created 2019-08-27)
https://issues.apache.org/jira/browse/BEAM-8035: 
[beam_PreCommit_Java_Phrase] [WatchTest.testMultiplePollsWithManyResults]  
Flake: Outputs must be in timestamp order (created 2019-08-22)
https://issues.apache.org/jira/browse/BEAM-7992: Unhandled type_constraint 
in 
apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
 (created 2019-08-16)
https://issues.apache.org/jira/browse/BEAM-7827: 
MetricsTest$AttemptedMetricTests.testAllAttemptedMetrics is flaky on 
DirectRunner (created 2019-07-26)
https://issues.apache.org/jira/browse/BEAM-7752: Java Validates 
DirectRunner: testTeardownCalledAfterExceptionInFinishBundleStateful flaky 
(created 2019-

Re: Multiple architectures support on Beam (ARM)

2021-06-10 Thread Robert Bradshaw
On Thu, Jun 10, 2021 at 3:00 AM Ismaël Mejía  wrote:
>
> As a follow up on this with the merge of 
> https://github.com/apache/beam/pull/14832 Beam will be producing python 
> wheels for AARCH64 starting on Beam 2.32.0!

Nice.

> Also due to the recent version updates (grpc, protobuf and arrow) we should 
> be pretty close to fully support it without extra compilation.
> Seems like the only missing piece is cython 
> https://github.com/cython/cython/issues/3892

Cython already supports ARM. This is just about providing pre-built
wheels for installing Cython (which aren't necessarily needed).

> Now the next important step would be to make the docker images multi-arch. 
> That would be a great contribution if someone is motivated.
>
>
> On Thu, Jan 28, 2021 at 1:47 AM Robert Bradshaw  wrote:
>>
>> Cython supports ARM64. The issue here is that we don't have a C++ compiler 
>> (It's looking for 'cc') available in the container (and grpc, and possibly 
>> others, don't have wheel files for this platform). I wonder if apt-get 
>> install build-essential would be sufficient.
>>
>> On Wed, Jan 27, 2021 at 2:22 PM Ismaël Mejía  wrote:
>>>
>>> Nice to see the interest, I also suppose that devs on Apple macbooks with 
>>> the
>>> new M1 processor will soon request this feature.
>>>
>>> I ran today some pipelines on ARM64 on classic runners relatively easy
>>> which was expected.  We will have issues however for the Java 8 SDK harness
>>> because the parent image openjdk:8 is not supported yet for ARM64.
>>>
>>> I tried to setup a python dev environment and found the first issue. It 
>>> looks
>>> like gRPC does not support arm64 yet [1][2] or am I misreading it?
>>>
>>> $ pip install -r build-requirements.txt
>>>
>>> Collecting grpcio-tools==1.30.0
>>>   Downloading grpcio-tools-1.30.0.tar.gz (2.1 MB)
>>>  || 2.1 MB 21.7 MB/s
>>> ERROR: Command errored out with exit status 1:
>>>  command: /home/ubuntu/.virtualenvs/beam-dev/bin/python3 -c
>>> 'import sys, setuptools, tokenize; sys.argv[0] =
>>> '"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';
>>> __file__='"'"'/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py'"'"';f=getattr(tokenize,
>>> '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
>>> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
>>> egg_info --egg-base /tmp/pip-pip-egg-info-km8agjf4
>>>  cwd: 
>>> /tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/
>>> Complete output (11 lines):
>>> Traceback (most recent call last):
>>>   File "", line 1, in 
>>>   File 
>>> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>>> line 112, in 
>>> if check_linker_need_libatomic():
>>>   File 
>>> "/tmp/pip-install-3lhad2qc/grpcio-tools_d3562157df5c41db9110e4ccd165c87e/setup.py",
>>> line 73, in check_linker_need_libatomic
>>> cc_test = subprocess.Popen(['cc', '-x', 'c++', '-std=c++11', '-'],
>>>   File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
>>> self._execute_child(args, executable, preexec_fn, close_fds,
>>>   File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
>>> raise child_exception_type(errno_num, err_msg, err_filename)
>>> FileNotFoundError: [Errno 2] No such file or directory: 'cc'
>>> 
>>> WARNING: Discarding
>>> https://files.pythonhosted.org/packages/da/3c/bed275484f6cc262b5de6ceaae36798c60d7904cdd05dc79cc830b880687/grpcio-tools-1.30.0.tar.gz#sha256=7878adb93b0c1941eb2e0bed60719f38cda2ae5568bc0bcaa701f457e719a329
>>> (from https://pypi.org/simple/grpcio-tools/). Command errored out with
>>> exit status 1: python setup.py egg_info Check the logs for full
>>> command output.
>>> ERROR: Could not find a version that satisfies the requirement
>>> grpcio-tools==1.30.0
>>> ERROR: No matching distribution found for grpcio-tools==1.30.0
>>>
>>> [1] https://pypi.org/project/grpcio-tools/#files
>>> [2] https://github.com/grpc/grpc/issues/21283
>>>
>>> I can imagine also that we will have some struggles with the python harness
>>> and all of its dependencies. Does cython already support ARM64?
>>>
>>> I went and filled some JIRAs to keep track of this:
>>>
>>> BEAM-11703 Support apache-beam python install on ARM64
>>> BEAM-11704 Support Beam docker images on ARM64
>>>
>>>
>>> On Tue, Jan 26, 2021 at 8:48 PM Robert Burke  wrote:
>>> >
>>> > I believe so.
>>> >
>>> > The Go SDK requires in most instances for a user to Register their DoFns 
>>> > at package init time, linked to the type/functions fully qualified path 
>>> > as detemined by Go, which is consistent across architectures, at least 
>>> > with the standard toochain.
>>> >
>>> > Those strings are used to look things up on distributed workers, 
>>> > regardless of the architecture.
>>> >
>>> >
>>> >
>>> > 

Re: Removing deprecated oauth2client dependency for Python SDK

2021-06-10 Thread Luke Cwik
I did something very similar during the Dataflow Java 1.x to Beam Java 2.x
migration. The work boiled down to:
* swapping to a different library to get the application default
credentials (including fixing upstream bugs at Google and improving some
documentation)
* swapping existing API calls to use the new credentials object (was easy
since there was a trivial wrapper object that allowed you to convert new
credentials object into the old type that some API client libraries only
supported)
* a bunch of documentation and trivial plumbing issues

On Fri, May 14, 2021 at 5:33 PM Ahmet Altay  wrote:

> +Valentyn Tymofieiev  might have an idea.
>
> On Mon, May 3, 2021 at 4:12 PM Chuck Yang 
> wrote:
>
>> Hi Beam devs,
>>
>> I saw there has been some previous discussion [1][2] around removing
>> the deprecated oauth2client dependency and using the supported
>> google-auth dependency instead. A portion of this work seems to
>> involve migrating off of google-apitools since this call [3] is not
>> supported by credentials objects emitted by google-auth.
>>
>> Does anyone have any experience/insights on how much work migrating
>> off of oauth2client would involve? I might be able to help out but
>> wanted to see a) if anyone is already looking at this and b) if there
>> are any hidden obstacles beyond needing to move from google-apitools
>> to the google-cloud-* libraries. Any pointers are appreciated!
>>
>> We're interested in this migration because of the need to use custom
>> token URIs for issuing service account tokens--it's supported by
>> google-auth but not oauth2client.
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-7352
>> [2] https://github.com/google/apitools/issues/225#issuecomment-434884589
>> [3]
>> https://github.com/google/apitools/blob/v0.5.31/apitools/base/py/base_api.py#L266
>>
>> Thanks!
>> Chuck
>>
>> --
>>
>>
>> *Confidentiality Note:* We care about protecting our proprietary
>> information, confidential material, and trade secrets. This message may
>> contain some or all of those things. Cruise will suffer material harm if
>> anyone other than the intended recipient disseminates or takes any action
>> based on this message. If you have received this message (including any
>> attachments) in error, please delete it immediately and notify the sender
>> promptly.
>>
>


[Proposal] Go SDK Exits Experimental

2021-06-10 Thread Robert Burke
Hello Beam Community!

I propose we stop calling the Apache Beam Go SDK experimental.

This thread is to discuss it as a community, and any conditions that remain
that would prevent the exit.

*tl;dr;*
*Ask Questions for answers and links! I have both.*
This entails including it officially in the Release process, removing the
various "experimental" text throughout the repo etc,
and otherwise treating it like Python and Java. Some Go specific tasks
around dep versioning.

The Go SDK implements the beam model efficiently for most batch tasks,
including basic windowing.
Apache Beam Go jobs can execute, and are tested on all Portable runners.
The core APIs are not going to change in incompatible ways going forward.
Scalable transforms can be written through SplittableDoFns or via Cross
Language transforms.

The SDK isn't 100% feature complete, but keeping it experimental doesn't
help with that any further.
Communities grow through contributions and use, and experimental markers
dissuade users.
There's plenty to do in order expand what can be done with the SDK.
(Contributions welcome)

*Why Exit Experimental now?*

Typically when we call an SDK or API Experimental, it's because there's a
risk that API or behaviors may change significantly.
This in turn, leads to additional work for users of the SDK on every
release which leads to sticking to older versions or forking
to preserve behavior. Version updates should be looked forward to, and
viewed as having little risk. Further while there's been
previous dicussion about what the "low bar" is for a new SDK, it hasn't
been summarily applied to the Go SDK. I feel this has
hurt development and contribution of new SDK languages (inherent difficulty
of SDK development notwithstanding).

When the SDK was designed, it wasn't entirely clear what the Beam Model
should look like in an opinionated language like Go.
Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc [0])
goes into detail what it means for a language without
Generics, or overloading, or inheritance to implement the beam model. One
could largely throw away static types (like Python),
but this approach rings hollow for Go. It would not do if the approach
couldn't grow and scale to the Beam Model. It's also hard
to tell if an API is any good before there are users.

Further, in the early days of Portability, there wasn't a way to write
scalable DoFns, dynamically or otherwise. It's an incredible
bottleneck to need to do all initial fanout of work on a single machine,
write everything to a Reshuffle, just in order to scale up.
Without being able to scale, Beam is little more than overhead.

At this point, both of these needs are met within the Go SDK for open
source.

*Background*

The Go SDK has been a part of the beam repo for a few years now, since it
was accidentally merged into master.
Since then it's been called experimental, and not officially part of the
releases.

Of the SDKs, it's was always designed around Beam Portability first. It
never had any "Legacy" (SDK x Runner specific ) workers.
It's always used the Beam Pipeline protos and FnAPI to execute jobs, first
with some very experimental code on Dataflow, but now
on all portable supported runners, like Flink, Spark, the Python Portable
runner, and Dataflow.

*API Stability*

The Go SDK hasn't meaningfully changed it's user API for DoFn and pipeline
construction since it was first merged in, and there are no
changes to that on the horizon that can't be made in a backwards compatible
manner. Largely these are related to New Features, or
usability improvements enabled by the advent of Go Generics (think of
"real" KV, emitter, and iterator types).

It's an open secret that the Go SDK has largely been under work for use
within Google. It's use is called FlumeGo, representing
the Apache Beam Go SDK, running on top of Flume, Google's batch pipeline
processing engine. Thus most of the focus on improving
batch execution. FlumeGo sees ample use today, and there hasn't been a call
for fundamental changes to the API for ergonomic or
usability concerns.

*Scalability*

Google could get away without the Go SDK having an SDK side scalability
solution as a result of it's integration with Flume.
However, those days are now past.

The Go SDK now supports SplittableDoFns along with Dynamic Splitting, which
supports writing scalable batch transforms natively
in the Go SDK.
The SDK also supports Cross Language Transforms, with Beam Schema
encodings. With it, production hardened transforms
from Java and Python are a wrapper away.

Presently, Daniel Oliveira (who implemented the SDF side work, and
completed the Xlang work,) is adding a wrapper for the
Java Kafka IO using Cross Language Transforms, which is often been
requested. This will also enable use of the Beam SQL
transforms that java enables.

*Features*

The Go SDK implements the Beam C=core. The Go SDK implements standard
coders, allows for user DoFns, and CombineFns and access
to core transforms lik

Re: [NEED HELP] PMC only finalization items for release 2.30.0

2021-06-10 Thread Ahmet Altay
I helped Heejong with this, Heejong let us know if you need anything else
related to this.

On Wed, Jun 9, 2021 at 10:15 PM Heejong Lee  wrote:

> Hi,
>
> I'm finishing 2.30.0 release and need help doing PMC only finalization
> items in the release guide (
> https://beam.apache.org/contribute/release-guide/#10-finalize-the-release).
> Please let me know if any PMC members have some time to do these tasks :)
>
> Thanks!
>