Re: Flake trends - better?

2021-05-06 Thread Kenneth Knowles
I spoke too soon?

https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12319527=daily=300=12319527=com.atlassian.jira.jira-core-reports-plugin%3Aaverageage-report_token=A5KQ-2QAV-T4JA-FDED_ea6ac783c727523cf6bfed04ba94ce91bb62da91_lin=Next

On Thu, May 6, 2021 at 5:54 PM Kenneth Knowles  wrote:

> I made a quick* Jira chart to see how we are doing at flakes:
>
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=464=BEAM=reporting=cumulativeFlowDiagram=1174=1175=2038=2039=2040
>
> Looking a lot better recently at resolving them! (whether these are new
> fixes or just resolving stale bugs, I love it)
>
> Kenn
>
> *AFAICT you need to make a saved search, then an agile board based on the
> saved search, then you can look at reports
>


Flake trends - better?

2021-05-06 Thread Kenneth Knowles
I made a quick* Jira chart to see how we are doing at flakes:

https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=464=BEAM=reporting=cumulativeFlowDiagram=1174=1175=2038=2039=2040

Looking a lot better recently at resolving them! (whether these are new
fixes or just resolving stale bugs, I love it)

Kenn

*AFAICT you need to make a saved search, then an agile board based on the
saved search, then you can look at reports


Flaky test issue report

2021-05-06 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests. These are P1 issues 
because they have a major negative impact on the community and make it hard to 
determine the quality of the software.

BEAM-12303: Flake in PubSubIntegrationTest.test_streaming_with_attributes 
(https://issues.apache.org/jira/browse/BEAM-12303)
BEAM-12293: FlinkSavepointTest.testSavepointRestoreLegacy flakes due to 
FlinkJobNotFoundException (https://issues.apache.org/jira/browse/BEAM-12293)
BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (https://issues.apache.org/jira/browse/BEAM-12291)
BEAM-12250: Java ValidatesRunner Postcommits timing out 
(https://issues.apache.org/jira/browse/BEAM-12250)
BEAM-12200: SamzaStoreStateInternalsTest is flaky 
(https://issues.apache.org/jira/browse/BEAM-12200)
BEAM-12163: Python GHA PreCommits flake with grpc.FutureTimeoutError on SDK 
harness startup (https://issues.apache.org/jira/browse/BEAM-12163)
BEAM-12061: beam_PostCommit_SQL failing on 
KafkaTableProviderIT.testFakeNested 
(https://issues.apache.org/jira/browse/BEAM-12061)
BEAM-12020: :sdks:java:container:java8:docker failing missing licenses 
(https://issues.apache.org/jira/browse/BEAM-12020)
BEAM-12019: 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky (https://issues.apache.org/jira/browse/BEAM-12019)
BEAM-11837: Null checking causes build flakes: "Memory constraints are 
impeding performance" (https://issues.apache.org/jira/browse/BEAM-11837)
BEAM-11792: Python precommit failed (flaked?) installing package  
(https://issues.apache.org/jira/browse/BEAM-11792)
BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (https://issues.apache.org/jira/browse/BEAM-11666)
BEAM-11662: elasticsearch tests failing 
(https://issues.apache.org/jira/browse/BEAM-11662)
BEAM-11661: hdfsIntegrationTest flake: network not found (py38 postcommit) 
(https://issues.apache.org/jira/browse/BEAM-11661)
BEAM-11646: beam_PostCommit_XVR_Spark failing 
(https://issues.apache.org/jira/browse/BEAM-11646)
BEAM-11645: beam_PostCommit_XVR_Flink failing 
(https://issues.apache.org/jira/browse/BEAM-11645)
BEAM-11541: testTeardownCalledAfterExceptionInProcessElement flakes on 
direct runner. (https://issues.apache.org/jira/browse/BEAM-11541)
BEAM-11540: Linter sometimes flakes on apache_beam.dataframe.frames_test 
(https://issues.apache.org/jira/browse/BEAM-11540)
BEAM-10995: Java + Universal Local Runner: 
WindowingTest.testWindowPreservation fails 
(https://issues.apache.org/jira/browse/BEAM-10995)
BEAM-10987: stager_test.py::StagerTest::test_with_main_session flaky on 
windows py3.6,3.7 (https://issues.apache.org/jira/browse/BEAM-10987)
BEAM-10968: flaky test: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-10968)
BEAM-10955: Flink Java Runner test flake: Could not find Flink job  
(https://issues.apache.org/jira/browse/BEAM-10955)
BEAM-10923: Python requirements installation in docker container is flaky 
(https://issues.apache.org/jira/browse/BEAM-10923)
BEAM-10899: test_FhirIO_exportFhirResourcesGcs flake with OOM 
(https://issues.apache.org/jira/browse/BEAM-10899)
BEAM-10866: PortableRunnerTestWithSubprocesses.test_register_finalizations 
flaky on macOS (https://issues.apache.org/jira/browse/BEAM-10866)
BEAM-10590: BigQueryQueryToTableIT flaky: test_big_query_new_types 
(https://issues.apache.org/jira/browse/BEAM-10590)
BEAM-10519: 
MultipleInputsAndOutputTests.testParDoWithSideInputsIsCumulative flaky on Samza 
(https://issues.apache.org/jira/browse/BEAM-10519)
BEAM-10504: Failure / flake in ElasticSearchIOTest > 
testWriteFullAddressing and testWriteWithIndexFn 
(https://issues.apache.org/jira/browse/BEAM-10504)
BEAM-10501: CheckGrafanaStalenessAlerts and PingGrafanaHttpApi fail with 
Connection refused (https://issues.apache.org/jira/browse/BEAM-10501)
BEAM-10485: Failure / flake: ElasticsearchIOTest > testWriteWithIndexFn 
(https://issues.apache.org/jira/browse/BEAM-10485)
BEAM-10272: Failure in CassandraIOTest init: cannot create cluster due to 
netty link error (https://issues.apache.org/jira/browse/BEAM-10272)
BEAM-9649: beam_python_mongoio_load_test started failing due to mismatched 
results (https://issues.apache.org/jira/browse/BEAM-9649)
BEAM-9392: TestStream tests are all flaky 
(https://issues.apache.org/jira/browse/BEAM-9392)
BEAM-9232: BigQueryWriteIntegrationTests is flaky coercing to Unicode 
(https://issues.apache.org/jira/browse/BEAM-9232)
BEAM-9119: 
apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest[...].test_large_elements
 is flaky (https://issues.apache.org/jira/browse/BEAM-9119)
BEAM-8879: IOError flake in 

P1 issues report

2021-05-06 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests.

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

BEAM-12290: TestPubsub.assertThatSubscriptionEventuallyCreated timeout does 
not work (https://issues.apache.org/jira/browse/BEAM-12290)
BEAM-12287: beam_PerformanceTests_Kafka_IO failing due to 
:sdks:java:container:pullLicenses failure 
(https://issues.apache.org/jira/browse/BEAM-12287)
BEAM-12285: Failure in beam_PostCommit_SQL 
(https://issues.apache.org/jira/browse/BEAM-12285)
BEAM-12282: Failure in ':vendor:grpc-1_26_0:validateVendoring' 
(https://issues.apache.org/jira/browse/BEAM-12282)
BEAM-12279: Implement destination-dependent sharding in FileIO.writeDynamic 
(https://issues.apache.org/jira/browse/BEAM-12279)
BEAM-12258: SQL postcommit timing out 
(https://issues.apache.org/jira/browse/BEAM-12258)
BEAM-12256: PubsubIO.readAvroGenericRecord creates SchemaCoder that fails 
to decode some Avro logical types 
(https://issues.apache.org/jira/browse/BEAM-12256)
BEAM-12231: beam_PostRelease_NightlySnapshot failing 
(https://issues.apache.org/jira/browse/BEAM-12231)
BEAM-1: Dataflow side input translation "Unknown producer for value" 
(https://issues.apache.org/jira/browse/BEAM-1)
BEAM-11959: Python Beam SDK Harness hangs when installing pip packages 
(https://issues.apache.org/jira/browse/BEAM-11959)
BEAM-11906: No trigger early repeatedly for session windows 
(https://issues.apache.org/jira/browse/BEAM-11906)
BEAM-11875: XmlIO.Read does not handle XML encoding per spec 
(https://issues.apache.org/jira/browse/BEAM-11875)
BEAM-11828: JmsIO is not acknowledging messages correctly 
(https://issues.apache.org/jira/browse/BEAM-11828)
BEAM-11755: Cross-language consistency (RequiresStableInputs) is quietly 
broken (at least on portable flink runner) 
(https://issues.apache.org/jira/browse/BEAM-11755)
BEAM-11578: `dataflow_metrics` (python) fails with TypeError (when int 
overflowing?) (https://issues.apache.org/jira/browse/BEAM-11578)
BEAM-11576: Go ValidatesRunner failure: TestFlattenDup on Dataflow Runner 
(https://issues.apache.org/jira/browse/BEAM-11576)
BEAM-11434: Expose Spanner admin/batch clients in Spanner Accessor 
(https://issues.apache.org/jira/browse/BEAM-11434)
BEAM-11227: Upgrade beam-vendor-grpc-1_26_0-0.3 to fix CVE-2020-27216 
(https://issues.apache.org/jira/browse/BEAM-11227)
BEAM-11148: Kafka commitOffsetsInFinalize OOM on Flink 
(https://issues.apache.org/jira/browse/BEAM-11148)
BEAM-11017: Timer with dataflow runner can be set multiple times (dataflow 
runner) (https://issues.apache.org/jira/browse/BEAM-11017)
BEAM-10861: Adds URNs and payloads to PubSub transforms 
(https://issues.apache.org/jira/browse/BEAM-10861)
BEAM-10670: Make non-portable Splittable DoFn the only option when 
executing Java "Read" transforms 
(https://issues.apache.org/jira/browse/BEAM-10670)
BEAM-10617: python CombineGlobally().with_fanout() cause duplicate combine 
results for sliding windows (https://issues.apache.org/jira/browse/BEAM-10617)
BEAM-10569: SpannerIO tests don't actually assert anything. 
(https://issues.apache.org/jira/browse/BEAM-10569)
BEAM-10288: Quickstart documents are out of date 
(https://issues.apache.org/jira/browse/BEAM-10288)
BEAM-10244: Populate requirements cache fails on poetry-based packages 
(https://issues.apache.org/jira/browse/BEAM-10244)
BEAM-10100: FileIO writeDynamic with AvroIO.sink not writing all data 
(https://issues.apache.org/jira/browse/BEAM-10100)
BEAM-9564: Remove insecure ssl options from MongoDBIO 
(https://issues.apache.org/jira/browse/BEAM-9564)
BEAM-9455: Environment-sensitive provisioning for Dataflow 
(https://issues.apache.org/jira/browse/BEAM-9455)
BEAM-9293: Python direct runner doesn't emit empty pane when it should 
(https://issues.apache.org/jira/browse/BEAM-9293)
BEAM-8986: SortValues may not work correct for numerical types 
(https://issues.apache.org/jira/browse/BEAM-8986)
BEAM-8985: SortValues should fail if SecondaryKey coder is not 
deterministic (https://issues.apache.org/jira/browse/BEAM-8985)
BEAM-8407: [SQL] Some Hive tests throw NullPointerException, but get marked 
as passing (Direct Runner) (https://issues.apache.org/jira/browse/BEAM-8407)
BEAM-7717: PubsubIO watermark tracking hovers near start of epoch 
(https://issues.apache.org/jira/browse/BEAM-7717)
BEAM-7716: PubsubIO returns empty message bodies for all messages read 
(https://issues.apache.org/jira/browse/BEAM-7716)
BEAM-7195: BigQuery - 404 errors for 'table not found' when using dynamic 
destinations - sometimes, new table fails to get created 
(https://issues.apache.org/jira/browse/BEAM-7195)
BEAM-6839: User reports protobuf ClassChangeError running against 2.6.0 or 
above (https://issues.apache.org/jira/browse/BEAM-6839)
  

Re: [Proposal] Support State Batching and Prefetching over FnApi

2021-05-06 Thread Rui Wang
At this moment, the third approach in the doc is preferred. To recap, the
third approach is the one that only changes FnApi by adding a repeated
field in the state request to support batching over FnApi.

This approach has the following benefits:
1. Avoid double requests problem introduced by prefetching (prefetching
needs two requests, one for prefetch and one for blocking fetch).
2. This approach does not conflict with prefetching so no backward
compatibility issue even when we want to add prefetching in FnApi. So this
approach can be a good starting point.

The caveat though is this approach does not support smart prefetching
(which needs runners support). However we can add that in the future if
necessary and that won't conflict with existing design.

Please let us know if you have any objection before the implementation.


-Rui

On Mon, Mar 22, 2021 at 12:27 PM Rui Wang  wrote:

> Hi Community,
>
> Andrew Crites and I drafted a document to discuss how to support state
> prefetching and batching over FnApi, which seems a missing functionality in
> FnApi. This will help us support Java state readLater() Api over FnApi.
>
> Please see:
> https://docs.google.com/document/d/1Z3a5YOZyYsN8MeS6hRhCXX31m9bKCXSOtjKSl7wX40c/edit?usp=sharing=0-eiNl525kmb3Av2bqgCsZUA
>
>
> -Rui
>


Re: BEAM-11838: An x-lang wrapper for DebeziumIO

2021-05-06 Thread Pablo Estrada
Hi Benjamin!
That's strange if the Java test is working well that the Python/xlang test
would run into issues. Let me take a look at your PR and see if I can see
anything suspicious.
Best
-P.

On Thu, May 6, 2021 at 10:49 AM Benjamin Gonzalez Delgado <
benjamin.gonza...@wizeline.com> wrote:

> Hi team,
> I am working on BEAM-11838[1], but I am a little stuck right now[2], my
> issue is that when I set param maxNumberOfRecords through
> ExternalTransformBuilder(L117) its sets the value correctly to the
> constructor on KafkaSourceConsumerFn (L111), but when it arrives at
> tryClaim method(L307) the value for maxNumberOfRecords is null again and it
> keeps running forever so my test never ends.
> I wonder if some of you have experience with KafkaSourceConsumerFn class
> or creating other cross-language transforms that could help.
> Any guidance on this would be appreciated.
>
> PD. The existing java test DebeziumIOMySqlConnectorIT does not have the
> same problem.
>
> [1] https://issues.apache.org/jira/browse/BEAM-11838
> [2] https://github.com/benWize/beam/pull/12/files#
>
>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*


Re: Please help with the 2.29.0 release announcement! (curate the bugs)

2021-05-06 Thread Kenneth Knowles
Yes. Done!

On Thu, May 6, 2021 at 7:06 AM Alexey Romanenko 
wrote:

> Don’t we need to make 2.29.0 “released" in Jira to make it possible to
> specify it in “Affects Version/s” since the artefacts are already
> available?
>
> On 28 Apr 2021, at 19:23, Brian Hulette  wrote:
>
> Here's a query to find 2.29.0 issues where you are the author or reporter
> [1].
>
> [1]
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.29.0%20AND%20(assignee%20in%20(currentUser())%20OR%20reporter%20in%20(currentUser()))
>
> On Tue, Apr 27, 2021 at 9:22 PM Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> I was just self-reviewing the blog post [1] for the 2.29.0 release
>> announcement.
>>
>> I have these requests, which I would appreciate if I were a user:
>>
>> * Can you browse the jiras [2] and change them between
>> Bug/Feature/Improvement and make sure they are relevant? In particular:
>>   -  "sub-task" is so generic; only use if it does not stand on its own
>>   - "feature" should be something a user might care about (IMO)
>>   - "bug" is something else a user might care about, if they are figuring
>> out how far they have to upgrade
>>   - "improvement" can be changes that aren't that interesting to users
>>   - if a test was failing for a while and we fixed it, set Fix Version =
>> Not Applicable since it is not really tied to a release
>>
>> * Can you take another pass and highlight things that should be on the
>> blog? Leave a comment on https://github.com/apache/beam/pull/14562. You
>> can highlight your own work or other people. A quick read makes me think a
>> lot is missing. The initial blog post is autogenerated from CHANGES.md but
>> we are not yet very good about making sure things are in that file.
>>
>> Kenn
>>
>> [1]
>> http://apache-beam-website-pull-requests.storage.googleapis.com/14562/blog/beam-2.29.0/index.html
>> [2]
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349629
>>
>
>


BEAM-11838: An x-lang wrapper for DebeziumIO

2021-05-06 Thread Benjamin Gonzalez Delgado
Hi team,
I am working on BEAM-11838[1], but I am a little stuck right now[2], my
issue is that when I set param maxNumberOfRecords through
ExternalTransformBuilder(L117) its sets the value correctly to the
constructor on KafkaSourceConsumerFn (L111), but when it arrives at
tryClaim method(L307) the value for maxNumberOfRecords is null again and it
keeps running forever so my test never ends.
I wonder if some of you have experience with KafkaSourceConsumerFn class or
creating other cross-language transforms that could help.
Any guidance on this would be appreciated.

PD. The existing java test DebeziumIOMySqlConnectorIT does not have the
same problem.

[1] https://issues.apache.org/jira/browse/BEAM-11838
[2] https://github.com/benWize/beam/pull/12/files#

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*


Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-06 Thread Chamikara Jayalath
On Thu, May 6, 2021 at 4:58 AM Nir Gazit  wrote:

> Hey,
> Not sure I follow - from the code you sent me, it seems that the
> environment is chosen according to the pipeline options
> (JAVA_SDK_HARNESS_ENVIRONMENT is used by createOrGetDefaultEnvironment only
> AFAIU). So if I'm passing `--environment_type=EXTERNAL` why wouldn't I get
> to using an external env as in line 152 there?
>
>
Did you mean that you set a Python pipeline option ? Note that Python
pipeline options do not automatically get converted to Java pipeline
options when using cross-language transforms.


> Nir
>
> On Wed, May 5, 2021 at 12:07 AM Chamikara Jayalath 
> wrote:
>
>> When you use cross-language Java transforms from Python we use the
>> default environment for Java transforms which always gets set to Docker.
>>
>> https://github.com/apache/beam/blob/7f0d11e65bbcd3e9c565a50aa6a56c0631c4358b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L447
>>
>> https://github.com/apache/beam/blob/7f0d11e65bbcd3e9c565a50aa6a56c0631c4358b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L134
>>
>> I don't think we support running Java cross-language transforms in other
>> environments yet.
>>
>> Thanks,
>> Cham
>>
>> On Tue, May 4, 2021 at 12:30 PM Nir Gazit  wrote:
>>
>>> But looking at the code of the exception
>>> 
>>>  it
>>> seems that it tries to use docker only because it thinks it's in a docker
>>> environment, no? Shouldn't it use the external environment (that's available
>>> there as well
>>> )
>>> if it got that in PipelineOptions?
>>>
>>> Otherwise, how can I have docker on a k8s pod? I couldn't seem to find
>>> any examples for that and saw that it's highly unrecommended.
>>>
>>> Thanks!
>>> Nir
>>>
>>> On Tue, May 4, 2021 at 9:08 PM Chamikara Jayalath 
>>> wrote:
>>>
 Ah, I think you need the DOCKER environment to use cross-language
 transforms not the EXTERNAL environment (agree that the terminology is
 confusing).

 On Tue, May 4, 2021 at 11:04 AM Nir Gazit  wrote:

> Yes that’s on purpose. I’m running in Kubernetes which makes it hard
> to install docker on the pods so I don’t want to use the docker
> environment. That’s why I specified EXTERNAL environment in
> PipelineOptions. However, it seems that it doesn’t get propagated.
>
> On Tue, 4 May 2021 at 20:59 Chamikara Jayalath 
> wrote:
>
>> Is it possible that you don't have the "docker" command available in
>> your system ?
>>
>> On Tue, May 4, 2021 at 10:28 AM Nir Gazit  wrote:
>>
>>> Hey,
>>> I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL
>>> environment. However, when the pipeline is run, the error below is 
>>> thrown,
>>> which implies that for some reason the external environment pipeline
>>> options didn't get in. When replacing the Kafka Source with an S3 source
>>> (for example) I don't get this error so it implies that it's
>>> somewhere around the external transform / expansion service area. How 
>>> can I
>>> debug this?
>>>
>>> Caused by: java.io.IOException: Cannot run program "docker":
>>> error=2, No such file or directory
>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:189)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:171)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:95)
>>> at
>>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
>>> at
>>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)
>>>
>>


Re: Please help with the 2.29.0 release announcement! (curate the bugs)

2021-05-06 Thread Alexey Romanenko
Don’t we need to make 2.29.0 “released" in Jira to make it possible to specify 
it in “Affects Version/s” since the artefacts are already available? 

> On 28 Apr 2021, at 19:23, Brian Hulette  wrote:
> 
> Here's a query to find 2.29.0 issues where you are the author or reporter [1].
> 
> [1] 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.29.0%20AND%20(assignee%20in%20(currentUser())%20OR%20reporter%20in%20(currentUser()))
>  
> 
> On Tue, Apr 27, 2021 at 9:22 PM Kenneth Knowles  > wrote:
> Hi all,
> 
> I was just self-reviewing the blog post [1] for the 2.29.0 release 
> announcement.
> 
> I have these requests, which I would appreciate if I were a user:
> 
> * Can you browse the jiras [2] and change them between 
> Bug/Feature/Improvement and make sure they are relevant? In particular:
>   -  "sub-task" is so generic; only use if it does not stand on its own
>   - "feature" should be something a user might care about (IMO)
>   - "bug" is something else a user might care about, if they are figuring out 
> how far they have to upgrade
>   - "improvement" can be changes that aren't that interesting to users
>   - if a test was failing for a while and we fixed it, set Fix Version = Not 
> Applicable since it is not really tied to a release
> 
> * Can you take another pass and highlight things that should be on the blog? 
> Leave a comment on https://github.com/apache/beam/pull/14562 
> . You can highlight your own work 
> or other people. A quick read makes me think a lot is missing. The initial 
> blog post is autogenerated from CHANGES.md but we are not yet very good about 
> making sure things are in that file.
> 
> Kenn
> 
> [1] 
> http://apache-beam-website-pull-requests.storage.googleapis.com/14562/blog/beam-2.29.0/index.html
>  
> 
> [2] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349629
>  
> 



Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-06 Thread Nir Gazit
Hey,
Not sure I follow - from the code you sent me, it seems that the
environment is chosen according to the pipeline options
(JAVA_SDK_HARNESS_ENVIRONMENT is used by createOrGetDefaultEnvironment only
AFAIU). So if I'm passing `--environment_type=EXTERNAL` why wouldn't I get
to using an external env as in line 152 there?

Nir

On Wed, May 5, 2021 at 12:07 AM Chamikara Jayalath 
wrote:

> When you use cross-language Java transforms from Python we use the default
> environment for Java transforms which always gets set to Docker.
>
> https://github.com/apache/beam/blob/7f0d11e65bbcd3e9c565a50aa6a56c0631c4358b/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L447
>
> https://github.com/apache/beam/blob/7f0d11e65bbcd3e9c565a50aa6a56c0631c4358b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L134
>
> I don't think we support running Java cross-language transforms in other
> environments yet.
>
> Thanks,
> Cham
>
> On Tue, May 4, 2021 at 12:30 PM Nir Gazit  wrote:
>
>> But looking at the code of the exception
>> 
>>  it
>> seems that it tries to use docker only because it thinks it's in a docker
>> environment, no? Shouldn't it use the external environment (that's available
>> there as well
>> )
>> if it got that in PipelineOptions?
>>
>> Otherwise, how can I have docker on a k8s pod? I couldn't seem to find
>> any examples for that and saw that it's highly unrecommended.
>>
>> Thanks!
>> Nir
>>
>> On Tue, May 4, 2021 at 9:08 PM Chamikara Jayalath 
>> wrote:
>>
>>> Ah, I think you need the DOCKER environment to use cross-language
>>> transforms not the EXTERNAL environment (agree that the terminology is
>>> confusing).
>>>
>>> On Tue, May 4, 2021 at 11:04 AM Nir Gazit  wrote:
>>>
 Yes that’s on purpose. I’m running in Kubernetes which makes it hard to
 install docker on the pods so I don’t want to use the docker environment.
 That’s why I specified EXTERNAL environment in PipelineOptions. However, it
 seems that it doesn’t get propagated.

 On Tue, 4 May 2021 at 20:59 Chamikara Jayalath 
 wrote:

> Is it possible that you don't have the "docker" command available in
> your system ?
>
> On Tue, May 4, 2021 at 10:28 AM Nir Gazit  wrote:
>
>> Hey,
>> I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL
>> environment. However, when the pipeline is run, the error below is 
>> thrown,
>> which implies that for some reason the external environment pipeline
>> options didn't get in. When replacing the Kafka Source with an S3 source
>> (for example) I don't get this error so it implies that it's
>> somewhere around the external transform / expansion service area. How 
>> can I
>> debug this?
>>
>> Caused by: java.io.IOException: Cannot run program "docker": error=2,
>> No such file or directory
>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:189)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:171)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:95)
>> at
>> org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:131)
>> at
>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252)
>> at
>> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231)
>>
>


Re: Window Assignment Across SplittableDoFn

2021-05-06 Thread Evan Galpin
Thanks!

On Wed, May 5, 2021 at 23:14 Boyuan Zhang  wrote:

> Hi,
> Yes, just like normal DoFn, Splittable DoFn preserves the window
> information as well.
>
> On Wed, May 5, 2021 at 8:04 PM Evan Galpin  wrote:
>
>> Hi folks,
>>
>> I’d just like to confirm what happens to window assignments through a
>> SplittableDoFn. Are output elements automatically assigned to the same
>> window as input elements?
>>
>> Thanks,
>> Evan
>>
>