Questions regarding contribution: Support for reading Kafka topics from any startReadTime in Java

2022-05-25 Thread Balázs Németh
https://issues.apache.org/jira/browse/BEAM-14518

https://github.com/apache/beam/blob/fd8546355523f67eaddc22249606fdb982fe4938/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java#L180-L198

Right now the 'startReadTime' config for KafkaIO.Read looks up an offset in
every topic partition that is newer or equal to that timestamp. The problem
is that if we use a timestamp that is so new, that we don't have any
newer/equal message in the partition. In that case the code fails with an
exception. Meanwhile in certain cases it makes no sense as we could
actually make it work.

If we don't get an offset from calling `consumer.offsetsForTimes`, we
should call `endOffsets`, and use the returned offset + 1. That is actually
the offset we will have to read next time.

Even if `endOffsets` can't return an offset we could use 0 as the offset to
read from.



Am I missing something here? Is it okay to contribute this?


Re: GSoC idea: mypyc as an alternative to cython

2022-05-25 Thread Chad Dombrova
>
> - What does the new prototype code look like (hopefully much cleaner)?
>

Instead of a separate pxd file, you just have the existing .py file with
standard typing annotations.


> - How does performance compare to the Cython approach?
>

Good question.  I've not been able to find any posts with comparisons.
mypyc maintains a benchmark repo with results compared to standard
cpython:
https://github.com/mypyc/mypyc-benchmark-results/blob/master/reports/summary-main.md

Running these benchmarks against cython could be a good first task.

Unlike Cython, mypyc doesn’t natively support numpy, but IIRC beam is not
using that in its cythonized modules.

-chad


Re: GSoC idea: mypyc as an alternative to cython

2022-05-25 Thread Sam Bourne
Is there any interest in this? There is a lot of promise in only needing to
maintain a single well typed implementation.

On Fri, Feb 11, 2022 at 6:35 PM Chad Dombrova  wrote:

> Hi all,
> At work, I recently started playing around with mypyc[1] as a means to
> compile our python code to C extensions, and I'm pretty impressed so far.
>
> Pros
>
>- write normal python code with annotations:  we're already doing this!
>- no need for cython-specific header files that can get out of sync
>with the pure python version
>- support for dataclasses and Generics
>- one less tool in the toolchain: mypyc is part of mypy
>- opens the door to more easily converting additional modules in the
>future
>
> Cons
>
>- less mature than Cython
>- build errors are not very informative
>
> Neural
>
>- requires more detailed annotations.  for example, you must annotate
>class attributes with ClassVar
>
> I thought it would be an interesting and relatively accessible project to
> try to convert the current modules that use cython over to mypyc and see
> how it goes.  Just a thought: take it or leave it!
>
> -chad
>
> [1] https://mypyc.readthedocs.io/en/latest/introduction.html
>
>


[GitHub] [beam-site] y1chi merged pull request #629: Publish 2.39.0 release

2022-05-25 Thread GitBox


y1chi merged PR #629:
URL: https://github.com/apache/beam-site/pull/629


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Flaky test issue report (58)

2022-05-25 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-14459: Docker Snapshots failing 
to be published since April 14th (created 2022-05-11)
https://issues.apache.org/jira/browse/BEAM-14410: FnRunnerTest with 
non-trivial (order 1000 elements) numpy input flakes in non-cython environment 
(created 2022-05-04)
https://issues.apache.org/jira/browse/BEAM-14407: Jenkins worker sometimes 
crashes while running Python Flink pipeline (created 2022-05-04)
https://issues.apache.org/jira/browse/BEAM-14367: Flaky timeout in github 
Python unit test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer (created 
2022-04-26)
https://issues.apache.org/jira/browse/BEAM-14349: GroupByKeyTest BasicTests 
testLargeKeys100MB flake (on ULR) (created 2022-04-21)
https://issues.apache.org/jira/browse/BEAM-14276: 
beam_PostCommit_Java_DataflowV2 failures parent bug (created 2022-04-07)
https://issues.apache.org/jira/browse/BEAM-14269: 
PulsarIOTest.testReadFromSimpleTopic is very flaky (created 2022-04-06)
https://issues.apache.org/jira/browse/BEAM-14263: 
beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing 
consistently (created 2022-04-05)
https://issues.apache.org/jira/browse/BEAM-14252: 
beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors 
(created 2022-04-05)
https://issues.apache.org/jira/browse/BEAM-14216: Multiple XVR Suites 
having similar flakes simultaneously (created 2022-03-31)
https://issues.apache.org/jira/browse/BEAM-14174: Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions  (created 
2022-03-24)
https://issues.apache.org/jira/browse/BEAM-14172: beam_PreCommit_PythonDocs 
failing (jinja2) (created 2022-03-24)
https://issues.apache.org/jira/browse/BEAM-13952: Dataflow streaming tests 
failing new AfterSynchronizedProcessingTime test (created 2022-02-15)
https://issues.apache.org/jira/browse/BEAM-13859: Test flake: 
test_split_half_sdf (created 2022-02-09)
https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13810: Flaky tests: Gradle build 
daemon disappeared unexpectedly (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13809: beam_PostCommit_XVR_Flink 
flaky: Connection refused (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13797: Flakes: Failed to load 
cache entry (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13708: flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored (created 2022-01-20)
https://issues.apache.org/jira/browse/BEAM-13575: Flink 
testParDoRequiresStableInput flaky (created 2021-12-28)
https://issues.apache.org/jira/browse/BEAM-13500: NPE in Flink Portable 
ValidatesRunner streaming suite (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13453: Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use 
(created 2021-12-13)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13367: 
[beam_PostCommit_Python36] [ 
apache_beam.io.gcp.experimental.spannerio_read_it_test] Failure summary 
(created 2021-12-01)
https://issues.apache.org/jira/browse/BEAM-13312: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
 is flaky in Java Spark ValidatesRunner suite  (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13311: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite. (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13237: 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2 (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13025: pubsublite.ReadWriteIT 
flaky in beam_PostCommit_Java_DataflowV2   (created 2021-10-08)
https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12859: 

P1 issues report (77)

2022-05-25 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-14483: Add Java cross-language 
transforms for invoking Python Map and FlatMap (created 2022-05-17)
https://issues.apache.org/jira/browse/BEAM-14481: OOM regression caused by 
Batched DoFn worker changes (created 2022-05-17)
https://issues.apache.org/jira/browse/BEAM-14459: Docker Snapshots failing 
to be published since April 14th (created 2022-05-11)
https://issues.apache.org/jira/browse/BEAM-14434: 
beam_LoadTests_Python_GBK_reiterate_Dataflow_Streaming failure (created 
2022-05-06)
https://issues.apache.org/jira/browse/BEAM-14431: Handle nulls using 
SnowflakeIO (created 2022-05-06)
https://issues.apache.org/jira/browse/BEAM-14421: 
--dataflowServiceOptions=use_runner_v2 is broken (created 2022-05-05)
https://issues.apache.org/jira/browse/BEAM-14411: TypeCodersTest is never 
executed (created 2022-05-04)
https://issues.apache.org/jira/browse/BEAM-14390: Java license check is 
broken (created 2022-05-02)
https://issues.apache.org/jira/browse/BEAM-14364: 404s in BigQueryIO don't 
get output to Failed Inserts PCollection (created 2022-04-25)
https://issues.apache.org/jira/browse/BEAM-14291: DataflowPipelineResult 
does not raise exception for unsuccessful states. (created 2022-04-11)
https://issues.apache.org/jira/browse/BEAM-14276: 
beam_PostCommit_Java_DataflowV2 failures parent bug (created 2022-04-07)
https://issues.apache.org/jira/browse/BEAM-14275: SpannerWriteIT failing in 
beam PostCommit Java V1 (created 2022-04-07)
https://issues.apache.org/jira/browse/BEAM-14265: Flink should hold the 
watermark at the output timestamp for processing time timers (created 
2022-04-06)
https://issues.apache.org/jira/browse/BEAM-14263: 
beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing 
consistently (created 2022-04-05)
https://issues.apache.org/jira/browse/BEAM-14253: pubsublite.ReadWriteIT 
failing in beam_PostCommit_Java_DataflowV1 and V2 (created 2022-04-05)
https://issues.apache.org/jira/browse/BEAM-14239: Changing the output 
timestamp of a timer does not clear the previously set timer (created 
2022-04-04)
https://issues.apache.org/jira/browse/BEAM-14174: Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions  (created 
2022-03-24)
https://issues.apache.org/jira/browse/BEAM-14135: BigQuery Storage API 
insert with writeResult retry and write to error table (created 2022-03-20)
https://issues.apache.org/jira/browse/BEAM-13952: Dataflow streaming tests 
failing new AfterSynchronizedProcessingTime test (created 2022-02-15)
https://issues.apache.org/jira/browse/BEAM-13950: PVR_Spark2_Streaming 
perma-red (created 2022-02-15)
https://issues.apache.org/jira/browse/BEAM-13920: Beam x-lang Dataflow 
tests failing due to _InactiveRpcError (created 2022-02-10)
https://issues.apache.org/jira/browse/BEAM-13852: 
KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions (created 
2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13805: Simplify version override 
for Dev versions of the Go SDK. (created 2022-02-02)
https://issues.apache.org/jira/browse/BEAM-13747: Add integration testing 
for BQ Storage API  write modes (created 2022-01-26)
https://issues.apache.org/jira/browse/BEAM-13715: Kafka commit offset drop 
data on failure for runners that have non-checkpointing shuffle (created 
2022-01-21)
https://issues.apache.org/jira/browse/BEAM-13669: Install Python wheel and 
dependencies to local venv in SDK harness (created 2022-01-17)
https://issues.apache.org/jira/browse/BEAM-13487: WriteToBigQuery Dynamic 
table destinations returns wrong tableId (created 2021-12-17)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13164: Race between member 
variable being accessed due to leaking uninitialized state via 
OutboundObserverFactory (created 2021-11-01)
https://issues.apache.org/jira/browse/BEAM-13132: WriteToBigQuery submits a 
duplicate BQ load job if a 503 error code is returned from googleapi (created 
2021-10-27)
https://issues.apache.org/jira/browse/BEAM-13087: 

Re: Failing Java precommit

2022-05-25 Thread Jan Lukavský

Yes, the precommit is passing now. +1

On 5/24/22 18:00, Brian Hulette wrote:
It looks like this change [1] from +Yi Hu 
 should fix it.


[1] https://github.com/apache/beam/pull/17734

On Tue, May 24, 2022 at 2:15 AM Jan Lukavský  wrote:

Hi,

I'm seeing a consistent failures in Java Precommit checks. E.g.

*16:52:31* * What went wrong:
*16:52:31* Execution failed for task ':sdks:java:io:hcatalog:compileJava'.
*16:52:31* > Could not resolve all files for configuration 
':sdks:java:io:hcatalog:compileClasspath'.
*16:52:31* > Could not resolve 
org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde.
*16:52:31*   Required by:
*16:52:31*   project :sdks:java:io:hcatalog > 
org.apache.hive:hive-exec:2.1.0 > org.apache.calcite:calcite-core:1.6.0
*16:52:31*> Could not resolve 
org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde.
*16:52:31*   > Could not get resource 
'https://public.nexus.pentaho.org/repository/proxy-public-3rd-party-release/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom'.
*16:52:31*  > Could not GET 
'https://public.nexus.pentaho.org/repository/proxy-public-3rd-party-release/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom'.
 Received status code 504 from server: Gateway Time-out


or similar

*07:58:12* Execution failed for task ':sdks:java:io:hcatalog:compileJava'.
*07:58:12* > Could not resolve all files for configuration 
':sdks:java:io:hcatalog:compileClasspath'.
*07:58:12* > Could not find 
org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde.

Looks like it is affecting all builds, do we know what is happening?
  Jan