Re: Command line to run DatstoreIO integration tests for java

2021-08-20 Thread Miguel Anzo Palomo
Hi, thanks for the help. I'm able to run the test locally by running the
sql postCommit :sdks:java:extensions:sql:postCommit. However my problem
comes when trying to change something in the query in order to inject an
error, for example trying to change the table name in a query in order to
test the error path. The problem is that in the integration tests when
making the call to the datastore read and write transforms the query is
provided by creating a BeamRelNode created with the query
sqlEnv.parseQuery(query)

which will throw a ParseException if I try to use a query with a non
existent table name. Is there a way to execute the integration test with a
query that will cause an error?


On Wed, Aug 18, 2021 at 1:28 PM Kyle Weaver  wrote:

> There's a hard-coded "include" clause in the source. You can change that.
> https://github.com/apache/beam/blob/a0fbe00ef12b72ec89672ab32ccc6d6331ca5edd/sdks/java/extensions/sql/build.gradle#L198
>
> On Wed, Aug 18, 2021 at 11:19 AM Andrew Pilloud 
> wrote:
>
>> You might be able to use the '--tests' flag. (This works for unit tests.)
>> For example:
>>
>> ./gradlew :sdks:java:extensions:sql:postCommit
>> --tests 
>> org.apache.beam.sdk.extensions.sql.meta.provider.datastore.DataStoreReadWriteIT.testDataStoreV1SqlWriteRead
>>
>> On Wed, Aug 18, 2021 at 11:14 AM Chamikara Jayalath 
>> wrote:
>>
>>> I don't have the Gradle command to run an individual IT unfortunately. I
>>> usually invoke the corresponding Jenkins test suite (PostCommit SQL in this
>>> case).
>>> Note that there are additional Datastore read/write IT tests that do not
>>> involve SQL (and run as a part of Java PostCommit).
>>>
>>> https://ci-beam.apache.org/job/beam_PostCommit_Java/7917/testReport/org.apache.beam.sdk.io.gcp.datastore/
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Tue, Aug 17, 2021 at 5:33 PM Alex Amato  wrote:
>>>
 +Chamikara, +Heejong,

 Do you have any ideas here? I want to verify the library does indeed
 produce exceptions as we expect and that we can catch them as done here:
 https://github.com/apache/beam/pull/15183

 On Mon, Aug 16, 2021 at 3:26 PM Miguel Anzo Palomo <
 miguel.a...@wizeline.com> wrote:

> Hi, I was able to run the integration test locally, but I'm having
> some troubles injecting an error to inject an error here [1] to trigger a
> DatastoreException here [2]. The problem is that if I try to change the
> table or fields in the select query, an exception is raised in [1] when
> parsing the sql to create the BeamRelNode. Would anyone know a way to
> inject an error in the integration test to cause a DatastoreException in
> [2]?
>
> [1]
> https://github.com/apache/beam/blob/ce406c69d2d06e5c5b659fa259253d5cc64249e7/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/datastore/DataStoreReadWriteIT.java#L105
> [2]
> https://github.com/apache/beam/blob/089ca7a62be86ad7865308f2cbd676bb18d7d648/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java#L927
>
> On Tue, Aug 10, 2021 at 3:25 PM Miguel Anzo Palomo <
> miguel.a...@wizeline.com> wrote:
>
>> Hi Alex,
>> My problem trying to locally run that test was that, since I don't
>> have access to the apache-beam-testing GCP project I was having problems
>> getting a datastore instance to read/write from.
>>
>> On Tue, Aug 10, 2021 at 1:08 PM Alex Amato 
>> wrote:
>>
>>> Miguel, I believe you were having trouble running this test, you
>>> mentioned that you needed a Datastore instance.
>>> Would you mind sharing the errors or any issue you had when you
>>> tried Ke's command?
>>>
>>> Ke, Do we need to specify a datastore instance to run this test?
>>> Where will it read/write from? I assume it will it default to something 
>>> in
>>> the apache-beam-testing GCP project, as other tests do.
>>>
>>>
>>> On Thu, Jul 29, 2021 at 6:16 PM Ke Wu  wrote:
>>>
 “Run SQL PostCommit” is essentially running
 “./gradlew :sqlPostCommit” [1]

 In your case where you would like to run DataStoreReadWriteIT only,
 it can be simplified [2] to

 "./gradlew :sdks:java:extensions:sql:postCommit”

 Best,
 Ke

 [1]
 https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PostCommit_SQL.groovy#L41

 [2]
 https://github.com/apache/beam/blob/master/build.gradle.kts#L190


 On Jul 29, 2021, at 9:58 AM, Alex Amato  wrote:

 I was hoping for the command line to run it. So that the test could
 be tweaked to inject an err

Re: [VOTE] Release 2.32.0, release candidate #1

2021-08-20 Thread Alexey Romanenko
Yes, you are right, Ahmet.

Though, I just wanted to confirm if the current open P0 issues were introduced 
after "release-2.32.0" branch cut and we don’t have any regressions. 


> On 20 Aug 2021, at 18:26, Ahmet Altay  wrote:
> 
> 
> 
> On Fri, Aug 20, 2021 at 8:14 AM Alexey Romanenko  > wrote:
> Thanks, Ankur! 
> 
> There are several P0 opened issues. They don’t affect this release?
> 
> I do not see any P0 open issues marked for 2.32.0 release [1]. Am I missing 
> something? Perhaps you are looking at the 2.33.0 release?
> 
> [1] 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22Triage%20Needed%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.32.0%20ORDER%20BY%20created%20DESC%2C%20priority%20DESC
>  
> 
> 
>  
> 
>> On 20 Aug 2021, at 06:49, Ankur Goenka > > wrote:
>> 
>> Hi everyone,
>> Please review and vote on the release candidate #1 for the version 2.32.0, 
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>> 
>> 
>> Reviewers are encouraged to test their own use cases with the release 
>> candidate, and vote +1 if
>> no issues are found.
>> 
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org 
>>  [2], which is signed with the key with fingerprint 
>> A6699985948 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.32.0-RC1" [5],
>> * website pull request listing the release [6], publishing the API reference 
>> manual [7], and the blog post [8].
>> * Java artifacts were built with Maven Gradle 6.8.3 and OpenJDK/Oracle JDK 
>> 1.8.0_181.
>> * Python artifacts are deployed along with the source release to the 
>> dist.apache.org  [2].
>> * Validation sheet with a tab for 2.32.0 release to help with validation [9].
>> * Docker images published to Docker Hub [10].
>> 
>> The vote will be open for at least 72 hours. It is adopted by majority 
>> approval, with at least 3 PMC affirmative votes.
>> 
>> For guidelines on how to try the release in your projects, check out our 
>> blog post at https://beam.apache.org/blog/validate-beam-release/ 
>> .
>> 
>> Thanks,
>> Release Manager
>> 
>> [1] 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12349992
>>  
>> 
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.32.0/ 
>> 
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
>> 
>> [4] https://repository.apache.org/content/repositories/orgapachebeam-1223/ 
>> 
>> [5] https://github.com/apache/beam/tree/v2.32.0-RC1 
>> 
>> [6] https://github.com/apache/beam/pull/15317 
>> 
>> [7] https://github.com/apache/beam-site/pull/618 
>> 
>> [8] https://github.com/apache/beam/pull/15324 
>> 
>> [9] 
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1747256107
>>  
>> 
>> [10] https://hub.docker.com/search?q=apache%2Fbeam&type=image 
>> 
> 



Flaky test issue report (31)

2021-08-20 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12387: beam_PostCommit_Python* 
timing out (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12386: 
beam_PostCommit_Py_VR_Dataflow(_V2) failing metrics tests (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12322: 
FnApiRunnerTestWithGrpcAndMultiWorkers flaky (py precommit) (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12309: 
PubSubIntegrationTest.test_streaming_data_only flake (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12307: 
PubSubBigQueryIT.test_file_loads flake (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12303: Flake in 
PubSubIntegrationTest.test_streaming_with_attributes (created 2021-05-06)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12200: 
SamzaStoreStateInternalsTest is flaky (created 2021-04-20)
https://issues.apache.org/jira/browse/BEAM-12163: Python GHA PreCommits 
flake with grpc.FutureTimeoutError on SDK harness startup (created 2021-04-13)
https://issues.apache.org/jira/browse/BEAM-12061: beam_PostCommit_SQL 
failing on KafkaTableProviderIT.testFakeNested (created 2021-03-27)
https://issues.apache.org/jira/browse/BEAM-11837: Java build flakes: 
"Memory constraints are impeding performance" (created 2021-02-18)
https://issues.apache.org/jira/browse/BEAM-11792: Python precommit failed 
(flaked?) installing package  (created 2021-02-10)
https://issues.apache.org/jira/browse/BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (created 2021-01-20)
https://issues.apache.org/jira/browse/BEAM-11661: hdfsIntegrationTest 
flake: network not found (py38 postcommit) (created 2021-01-19)
https://issues.apache.org/jira/browse/BEAM-11645: beam_PostCommit_XVR_Flink 
failing (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11641: Bigquery Read tests are 
flaky on Flink runner in Python PostCommit suites (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11541: 
testTeardownCalledAfterExceptionInProcessElement flakes on direct runner. 
(created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-10866: 
PortableRunnerTestWithSubprocesses.test_register_finalizations flaky on macOS 
(created 2020-09-09)
https://issues.apache.org/jira/browse/BEAM-10485: Failure / flake: 
ElasticsearchIOTest > testWriteWithIndexFn (created 2020-07-14)
https://issues.apache.org/jira/browse/BEAM-9649: 
beam_python_mongoio_load_test started failing due to mismatched results 
(created 2020-03-31)
https://issues.apache.org/jira/browse/BEAM-9232: 
BigQueryWriteIntegrationTests is flaky coercing to Unicode (created 2020-01-31)
https://issues.apache.org/jira/browse/BEAM-8101: Flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful for 
Direct, Spark, Flink (created 2019-08-27)
https://issues.apache.org/jira/browse/BEAM-8035: 
WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in timestamp 
order (sickbayed) (created 2019-08-22)
https://issues.apache.org/jira/browse/BEAM-7992: Unhandled type_constraint 
in 
apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
 (created 2019-08-16)
https://issues.apache.org/jira/browse/BEAM-7827: 
MetricsTest$AttemptedMetricTests.testAllAttemptedMetrics is flaky on 
DirectRunner (created 2019-07-26)
https://issues.apache.org/jira/browse/BEAM-7752: Java Validates 
DirectRunner: testTeardownCalledAfterExceptionInFinishBundleStateful flaky 
(created 2019-07-16)
https://issues.apache.org/jira/browse/BEAM-6804: [beam_PostCommit_Java] 
[PubsubReadIT.testReadPublicData] Timeout waiting on Sub (created 2019-03-11)
https://issues.apache.org/jira/browse/BEAM-5286: 
[beam_PostCommit_Java_GradleBuild][org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline][Flake]
 .sh script: text file busy. (created 2018-09-01)
https://issues.apache.org/jira/browse/BEAM-5172: 
org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky (created 
2018-08-20)


P1 issues report (41)

2021-08-20 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-12712: Java Portable VR 
PostCommits failing - ParDoTest TimerTests.testEventTimeTimerLoop (created 
2021-08-02)
https://issues.apache.org/jira/browse/BEAM-12694: DICOMIoIntegrationTest 
flaky due to store ID (Python PreCommit) (created 2021-07-30)
https://issues.apache.org/jira/browse/BEAM-12678: 
beam_PreCommit_GoPortable_Phrase failing to start the local job server (created 
2021-07-29)
https://issues.apache.org/jira/browse/BEAM-12632: ElasticsearchIO: Enabling 
both User/Pass auth and SSL overwrites User/Pass (created 2021-07-16)
https://issues.apache.org/jira/browse/BEAM-12607: Copy Code Snippet copies 
html tags (created 2021-07-13)
https://issues.apache.org/jira/browse/BEAM-12603: Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
 (created 2021-07-12)
https://issues.apache.org/jira/browse/BEAM-12525: SDF BoundedSource seems 
to execute significantly slower than 'normal' BoundedSource (created 2021-06-22)
https://issues.apache.org/jira/browse/BEAM-12500: Dataflow SocketException 
(SSLException) error while trying to send message from Cloud Pub/Sub to 
BigQuery (created 2021-06-16)
https://issues.apache.org/jira/browse/BEAM-12484: JdbcIO date conversion is 
sensitive to OS (created 2021-06-14)
https://issues.apache.org/jira/browse/BEAM-12467: 
java.io.InvalidClassException With Flink Kafka (created 2021-06-09)
https://issues.apache.org/jira/browse/BEAM-12436: 
[beam_PostCommit_Go_VR_flink| beam_PostCommit_Go_VR_spark] 
[:sdks:go:test:flinkValidatesRunner] Failure summary (created 2021-06-01)
https://issues.apache.org/jira/browse/BEAM-12380: Go SDK Kafka IO Transform 
implemented via XLang (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12374: Spark postcommit failing 
ResumeFromCheckpointStreamingTest (created 2021-05-20)
https://issues.apache.org/jira/browse/BEAM-12310: 
beam_PostCommit_Java_DataflowV2 failing (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12279: Implement 
destination-dependent sharding in FileIO.writeDynamic (created 2021-05-04)
https://issues.apache.org/jira/browse/BEAM-12256: 
PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some 
Avro logical types (created 2021-04-29)
https://issues.apache.org/jira/browse/BEAM-11959: Python Beam SDK Harness 
hangs when installing pip packages (created 2021-03-11)
https://issues.apache.org/jira/browse/BEAM-11906: No trigger early 
repeatedly for session windows (created 2021-03-01)
https://issues.apache.org/jira/browse/BEAM-11875: XmlIO.Read does not 
handle XML encoding per spec (created 2021-02-26)
https://issues.apache.org/jira/browse/BEAM-11828: JmsIO is not 
acknowledging messages correctly (created 2021-02-17)
https://issues.apache.org/jira/browse/BEAM-11755: Cross-language 
consistency (RequiresStableInputs) is quietly broken (at least on portable 
flink runner) (created 2021-02-05)
https://issues.apache.org/jira/browse/BEAM-11578: `dataflow_metrics` 
(python) fails with TypeError (when int overflowing?) (created 2021-01-06)
https://issues.apache.org/jira/browse/BEAM-11148: Kafka 
commitOffsetsInFinalize OOM on Flink (created 2020-10-28)
https://issues.apache.org/jira/browse/BEAM-11017: Timer with dataflow 
runner can be set multiple times (dataflow runner) (created 2020-10-05)
https://issues.apache.org/jira/browse/BEAM-10670: Make non-portable 
Splittable DoFn the only option when executing Java "Read" transforms (created 
2020-08-10)
https://issues.apache.org/jira/browse/BEAM-10617: python 
CombineGlobally().with_fanout() cause duplicate combine results for sliding 
windows (created 2020-07-31)
https://issues.apache.org/jira/browse/BEAM-10569: SpannerIO tests don't 
actually assert anything. (created 2020-07-23)
https://issues.apache.org/jira/browse/BEAM-10529: Kafka XLang fails for 
?empty? key/values (created 2020-07-18)
https://issues.apache.org/jira/browse/BEAM-10288: Quickstart documents are 
out of date (created 2020-06-19)
https://issues.apache.org/jira/browse/BEAM-10244: Populate requirements 
cache fails on poetry-based packages (created 2020-06-11)
https://issues.apache.org/jira/browse/BEAM-10100: FileIO writeDynamic with 
AvroIO.sink not writing all data (created 2020-05-27)
https://issues.apache.org/jira/browse/BEAM-9564: Remove insecure ssl 
options from MongoDBIO (created 2020-03-20)
https://issues.apache.org/jira/browse/BEAM-9482: 
beam_PerformanceTe

P0 (outage) report

2021-08-20 Thread Beam Jira Bot
This is your daily summary of Beam's current outages. See 
https://beam.apache.org/contribute/jira-priorities/#p0-outage for the meaning 
and expectations around P0 issues.

BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN 
(https://issues.apache.org/jira/browse/BEAM-12766)
BEAM-12764: Can't get attribute 'new_block' on https://issues.apache.org/jira/browse/BEAM-12764)
BEAM-12733: [beam_PostCommit_Java] 
org.apache.beam.sdk.extensions.ml.RecommendationAIPredictIT.predict failing  
(https://issues.apache.org/jira/browse/BEAM-12733)
BEAM-12683: PythonPostCommits RecommendationAIIT.test_create_catalog_item 
failing (https://issues.apache.org/jira/browse/BEAM-12683)


[Register] Allyship workshop for open source communities

2021-08-20 Thread Aizhamal Nurmamat kyzy
Hi everyone,

I am helping to organize an allyship training for open source contributors
sponsored by Google's OSPO. Few folks have expressed interest in this
workshop, so I am extending the invite to all of you.

Workshop date: Thursday, September 16th, 2021 at 8:30am PST or 5:30 PM
CEST.
Please register by filling out the form linked below[1]

Here are more details:

The training led by Dr. Kim Tran (https://www.kimtranphd.com/) aims to
position people of color and those in solidarity with us to develop the
necessary skills to build bridges across race, ability, gender, sexuality
and class.

Participants will leave:
- With the capacity to identify marginalization in real time in the open
source community
- Knowing how to address and respond to marginalization at individual and
systemic levels
- With a strong, critical understanding of the allyship framework
- Intersectional lenses, examining dynamics around gender, class, ability
and race
- A toolkit for recognizing and combating marginalization in real time

Format:
- Large groups will engage in a *90 minute*, remote learning session.
- Participants will be capped at 45 to enable engaging interactive
participation and responsiveness to participant questions and concerns.
- Session will include webinar style portion as well as breakouts for
hypothetical scenarios.


[1]
https://docs.google.com/forms/d/1DM7KNyd7QQKiCRrepQTIDAChvtC6QERCt1JUNO5R3Fc/edit


Re: P0 (outage) report

2021-08-20 Thread Kenneth Knowles
Anyone else have some time to help triage P0s and P1s? They should
represent actual outages preventing the community from making progress,
etc. And of course they are super urgent because of this.

Kenn

On Thu, Aug 19, 2021 at 11:01 AM Beam Jira Bot  wrote:

> This is your daily summary of Beam's current outages. See
> https://beam.apache.org/contribute/jira-priorities/#p0-outage for the
> meaning and expectations around P0 issues.
>
> BEAM-12766: Already Exists: Dataset
> apache-beam-testing:python_bq_file_loads_NNN (
> https://issues.apache.org/jira/browse/BEAM-12766)
> BEAM-12764: Can't get attribute 'new_block' on  'pandas.core.internals.blocks' (
> https://issues.apache.org/jira/browse/BEAM-12764)
> BEAM-12733: [beam_PostCommit_Java] 
> org.apache.beam.sdk.extensions.ml.RecommendationAIPredictIT.predict
> failing  (https://issues.apache.org/jira/browse/BEAM-12733)
> BEAM-12694: DICOMIoIntegrationTest flaky due to store ID (Python
> PreCommit) (https://issues.apache.org/jira/browse/BEAM-12694)
> BEAM-12683: PythonPostCommits
> RecommendationAIIT.test_create_catalog_item failing (
> https://issues.apache.org/jira/browse/BEAM-12683)
>


Re: Primitive Read not working with Flink portable runner

2021-08-20 Thread Jan Lukavský

Hi,

I've tried to build a better understanding of what is really happening 
and how, could someone validate my lines of thinking?


 a) under normal circumstances ExecutableStage has two pieces - a 
runner side and SDK side, passing data between these two is done over 
gRPC channel, the runner side is not supposed to understand the data and 
therefore runners-core-construction-java replaces coders for passing 
data between SDK harness and the runner with 
LengthPrefixCoder(ByteArrayCoder) - that means that the data is passed 
as opaque bytes


 b) the proto representation of the Pipeline contains the actual 
coders, without the specifying how should the data be passed between SDK 
harness and the runner (which seems correct, only the runner knows the 
environment, and it is therefore the runner's duty to build the actual 
wire harness coders)


 c) because of that, there are utility classes that inject 
LengthPrefixCoder where appropriate - most of the code is in WireCoders, 
but unfortunately ProcessBundleDescriptors does some work in this regard 
as well


 d) the problem arises when a runner decides to inline a PTransform 
that was supposed to be ExecutableStage and run it within the context of 
the runner - that is the case of Flink's primitive Read. In that case 
the Coders of how the runner encodes the PCollection on output from Read 
and how is then consumed in a (non-inlined) ExecutableStage do not match.


I tried to modify the ModelCoders, or to patch 
LengthPrefixUnknownCoders.addLengthPrefixedCoder, so that it can work 
with the case when both ends (SDK and runner) are Java, but I always hit 
an issue somewhere. I think that it is because the decision of which 
"wire coder" to use is in this case no longer a function of pair 
(coderId, SDK-side or runner-side), but of a tripple (coderId, producer 
side, consumer side). That is if the collection should be both produced 
and consumed in the runner environment, the coder should be different 
than if it is produced in runner and consumed in SDK-harness.


Another option seems that when a PCollection is produced directly in a 
runner, it should wrap it using LengthPrefixCoder (unless the coder used 
is already a ModelCoder), which is what I'll try next. I'll be grateful 
if someone verified that I understand the problem correctly and that the 
solution with LengthPrefixCoder on output of Read should work. The 
solution is somewhat suboptimal regarding performance, because it wraps 
the coder with LengthPrefixCoder in the case where all coders on the way 
are known and therefore the length prefix should not be needed. But I 
think that we could live with this right now, at least until some finer 
control of the in-out coders of ExecutableStage is introduced.


Thanks for any thoughts on this!

 Jan

On 8/1/21 8:33 PM, Jan Lukavský wrote:

Hi,

I have figured out another way of fixing the problem without modifying 
ModelCoders. It consists of creating a JavaSDKCoderTranslatorRegistrar 
[1] and fixing LengthPrefixUnknownCoders [2]. Would this be a better 
approach?


 Jan

[1] 
https://github.com/apache/beam/pull/15181/files#diff-e4df94a4e799e14a76ada42506aacb8cb7567c84349acacd6126c64ed03de062R27


[2] 
https://github.com/apache/beam/pull/15181/files#diff-64103a1eabf2872230e5df56cf02d535c4146f5a3f67c51c261433e4caa9a972R63


On 7/29/21 7:54 PM, Jan Lukavský wrote:

On 7/29/21 6:45 PM, Robert Bradshaw wrote:


On Thu, Jul 29, 2021 at 3:04 AM Jan Lukavský  wrote:

Hi,

I'd like to move the discussion of this topic further. Because it 
seems that fixing the portable SDF is a larger work, I think there 
are two options:

+1

  a) extend the definition of model coders to include SDK coders of 
the language that implement the model (that would mean that the 
definition of model coder is not "language agnostic coders", but 
"coders that a given SDK can instantiate"), or


  b) make the model coders extensible so that a runner can modify 
it - that would make it possible for each runner to have a slightly 
different definition of these model coders


I'm strongly in favor of a), but I can live with b) as well.

We should probably just rename "ModelCoders" to
"JavaCoders[Registrar]" and stick everything there. ModelCoders is not
understood or used by anything but Java. (That or we just discard the
whole ModelCoders thing and just let Coders define their own portable
representations, possibly with a registration system.)
Coders must be Serializable, so it seems to me, that all Java Coders 
are quite easily serialized and a registration is not exactly needed 
for that. Renaming ModelCoders to Java(Portable)Coders looks good to me.




Thanks in advance for any comments on this.

  Jan

On 7/25/21 8:59 PM, Jan Lukavský wrote:

I didn't want to say that Flink should not support SDF. I only do 
not see any benefits of it for a native streaming source - like 
Kafka - without the ability to use dynamic splitting. The potential 
benefits of composability and extensibi

Re: [VOTE] Release 2.32.0, release candidate #1

2021-08-20 Thread Alexey Romanenko
Thanks, Ankur! 

There are several P0 opened issues. They don’t affect this release?

> On 20 Aug 2021, at 06:49, Ankur Goenka  wrote:
> 
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 2.32.0, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
> Reviewers are encouraged to test their own use cases with the release 
> candidate, and vote +1 if
> no issues are found.
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org 
>  [2], which is signed with the key with fingerprint 
> A6699985948 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.32.0-RC1" [5],
> * website pull request listing the release [6], publishing the API reference 
> manual [7], and the blog post [8].
> * Java artifacts were built with Maven Gradle 6.8.3 and OpenJDK/Oracle JDK 
> 1.8.0_181.
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org  [2].
> * Validation sheet with a tab for 2.32.0 release to help with validation [9].
> * Docker images published to Docker Hub [10].
> 
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
> 
> For guidelines on how to try the release in your projects, check out our blog 
> post at https://beam.apache.org/blog/validate-beam-release/ 
> .
> 
> Thanks,
> Release Manager
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12349992
>  
> 
> [2] https://dist.apache.org/repos/dist/dev/beam/2.32.0/ 
> 
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
> 
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1223/ 
> 
> [5] https://github.com/apache/beam/tree/v2.32.0-RC1 
> 
> [6] https://github.com/apache/beam/pull/15317 
> 
> [7] https://github.com/apache/beam-site/pull/618 
> 
> [8] https://github.com/apache/beam/pull/15324 
> 
> [9] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1747256107
>  
> 
> [10] https://hub.docker.com/search?q=apache%2Fbeam&type=image 
>