Re: Why is Avro Date field using InstantCoder?

2021-10-17 Thread Cristian Constantinescu
If I had to change things, I would:

1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE or
something along those lines).
2. RowCoderGenerator around line 159 calls
"SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map
can be augmented (possibly dynamically) with new
fieldtypes-coder combinations to take care of the new types from #1.

I would also like to ask. Looking through the Beam code, I see a lot of
static calls. Just wondering why it's done this way. I'm used to projects
having some form of dependency injection involved and static calls being
frowned upon (lack of mockability, hidden dependencies, tight coupling
etc). The only reason I can think of is serializability given Beam's
multi-node processing?


On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax  wrote:

> Is the Schema inference the only reason we can't upgrade Avro, or are
> there other blockers? Is there any way we can tell at runtime which version
> of Avro is running? Since we generate the conversion code at runtime with
> ByteBuddy, we could potentially just generate different conversions
> depending on the Avro version.
>
> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu 
> wrote:
>
>> Those are fair points. However please consider that there might be new
>> users who will decide that Beam isn't suitable because of things like
>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started
>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now).
>>
>> I guess what I'm saying is that there's definitely a non-negligible cost
>> associated with old 3rd party libs in Beam's code (even if efforts are put
>> in to minimize them).
>>
>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax  wrote:
>>
>>>
>>>
>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu <
>>> zei...@gmail.com> wrote:
>>>
 To my knowledge and reading through AVRO's Jira[1], it does not support
 jodatime anymore.

 It seems everything related to this Avro 1.8 dependency is tricky. If
 you recall, it also prevents us from upgrading to the latest Confluent
 libs... for enabling Beam to use protobufs schemas with the schema
 registry. (I was also the one who brought that issue up, also made an
 exploratory PR to move AVRO outside of Beam core.)

 I understand that Beam tries to maintain public APIs stable, but I'd
 like to put forward two points:
 1) Schemas are experimental, hence there shouldn't be any API
 stability guarantees there.

>>>
>>> Unfortunately at this point, they aren't really. As a community we've
>>> been bad about removing the Experimental label - many core, core parts of
>>> Beam are still labeled experimental (sources, triggering, state, timers).
>>> Realistically they are no longer experimental.
>>>
>>> 2) Maybe this is the perfect opportunity for the Beam community to think
 about the long term effects of old dependencies within Beam's codebase, and
 especially how to deal with them. Perhaps starting/maintaining an
 "experimental" branch/maven-published-artifacts where Beam does not
 guarantee backwards compatibility (or maintains it for a shorter period of
 time) is something to think about.

>>>
>>> This is why we usually try to prevent third-party libraries from being
>>> in our public API. However in this case, that's tricky.
>>>
>>> The beam community can of course decide to break backwards
>>> compatibility. However as stated today, it is maintained. The last time we
>>> broke backwards compatibility was when the old Dataflow API was
>>> transitioned to Beam, and it was very painful. It took multiple years to
>>> get some users onto the Beam API due to the code changes required to
>>> migrate (and those required code changes weren't terribly invasive).
>>>
>>>

 [1] https://issues.apache.org/jira/browse/AVRO-2335

 On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax  wrote:

> Does this mean more recent versions of avro aren't backwards
> compatible with avro 1.8? If so, this might be tricky to fix, since Beam
> maintains backwards compatibility on its public API.
>
> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu <
> zei...@gmail.com> wrote:
>
>> Hi all,
>>
>> I've created a small demo project to show the issue[1].
>>
>> I've looked at the beam code and all the avro tests use avro 1.8...
>> which is hardcoded to Joda, hence why all the tests pass. Avro changed to
>> java.time (as most other recent java projects) after 1.8. Is there a plan
>> for Beam to do the same?
>>
>> Does anyone use Avro with java.time instead of joda.time that could
>> give me ideas how to make it work?
>>
>> Thank you,
>> Cristian
>>
>> [1] 

Performance of Apache Beam

2021-10-17 Thread azhar mirza
Hi Team
Could you please let me know following below answers .

I need to know performance of apache beam vs flink if we use flink as
runner for Beam, what will be the additional overhead converting Beam to
flink

How fault tolerance and resiliency handled in apache beam.
How apache beam handles backpressure?

Thanks
Azhar


Flaky test issue report (30)

2021-10-17 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-13025: 
beam_PostCommit_Java_DataflowV2 failing pubsublite.ReadWriteIT (created 
2021-10-08)
https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12859: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12809: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12794: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12540: 
beam_PostRelease_NightlySnapshot - Task 
:runners:direct-java:runMobileGamingJavaDirect FAILED (created 2021-06-25)
https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: 
Failed to read inputs in the data plane (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12200: 
SamzaStoreStateInternalsTest is flaky (created 2021-04-20)
https://issues.apache.org/jira/browse/BEAM-12163: Python GHA PreCommits 
flake with grpc.FutureTimeoutError on SDK harness startup (created 2021-04-13)
https://issues.apache.org/jira/browse/BEAM-12061: beam_PostCommit_SQL 
failing on KafkaTableProviderIT.testFakeNested (created 2021-03-27)
https://issues.apache.org/jira/browse/BEAM-11837: Java build flakes: 
"Memory constraints are impeding performance" (created 2021-02-18)
https://issues.apache.org/jira/browse/BEAM-11661: hdfsIntegrationTest 
flake: network not found (py38 postcommit) (created 2021-01-19)
https://issues.apache.org/jira/browse/BEAM-11645: beam_PostCommit_XVR_Flink 
failing (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11641: Bigquery Read tests are 
flaky on Flink runner in Python PostCommit suites (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11541: 
testTeardownCalledAfterExceptionInProcessElement flakes on direct runner. 
(created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-10955: Flink Java Runner test 
flake: Could not find Flink job (FlinkJobNotFoundException) (created 2020-09-23)
https://issues.apache.org/jira/browse/BEAM-10866: 
PortableRunnerTestWithSubprocesses.test_register_finalizations flaky on macOS 
(created 2020-09-09)
https://issues.apache.org/jira/browse/BEAM-10485: Failure / flake: 
ElasticsearchIOTest > testWriteWithIndexFn (created 2020-07-14)
https://issues.apache.org/jira/browse/BEAM-9649: 
beam_python_mongoio_load_test started failing due to mismatched results 
(created 2020-03-31)
https://issues.apache.org/jira/browse/BEAM-8101: Flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful for 
Direct, Spark, Flink (created 2019-08-27)
https://issues.apache.org/jira/browse/BEAM-8035: 
WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in timestamp 
order (sickbayed) (created 2019-08-22)
https://issues.apache.org/jira/browse/BEAM-7827: 
MetricsTest$AttemptedMetricTests.testAllAttemptedMetrics is flaky on 
DirectRunner (created 2019-07-26)
https://issues.apache.org/jira/browse/BEAM-7752: Java Validates 
DirectRunner: testTeardownCalledAfterExceptionInFinishBundleStateful flaky 
(created 2019-07-16)
https://issues.apache.org/jira/browse/BEAM-6804: [beam_PostCommit_Java] 
[PubsubReadIT.testReadPublicData] Timeout waiting on Sub (created 2019-03-11)
https://issues.apache.org/jira/browse/BEAM-5286: 
[beam_PostCommit_Java_GradleBuild][org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline][Flake]
 .sh script: text file busy. (created 2018-09-01)
https://issues.apache.org/jira/browse/BEAM-5172: 
org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky (created 
2018-08-20)


P1 issues report (49)

2021-10-17 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-13060: Daily Python SDK build is 
not publicly accessible (created 2021-10-15)
https://issues.apache.org/jira/browse/BEAM-13059: Migrate GKE workloads to 
Containerd (created 2021-10-15)
https://issues.apache.org/jira/browse/BEAM-13058: Upgrade Kubernetes APIs 
(created 2021-10-15)
https://issues.apache.org/jira/browse/BEAM-13056: 
DoFnSignature.fieldAccessDeclarations missing implicit accesses (created 
2021-10-14)
https://issues.apache.org/jira/browse/BEAM-13053: Avoid runner v2 when 
streaming engine explicitly disabled. (created 2021-10-14)
https://issues.apache.org/jira/browse/BEAM-13025: 
beam_PostCommit_Java_DataflowV2 failing pubsublite.ReadWriteIT (created 
2021-10-08)
https://issues.apache.org/jira/browse/BEAM-13010: Delete orphaned files 
(created 2021-10-06)
https://issues.apache.org/jira/browse/BEAM-12995: Consumer group with 
random prefix (created 2021-10-04)
https://issues.apache.org/jira/browse/BEAM-12959: Dataflow error in 
CombinePerKey operation (created 2021-09-26)
https://issues.apache.org/jira/browse/BEAM-12867: Either Create or 
DirectRunner fails to produce all elements to the following transform (created 
2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12843: (Broken Pipe induced) 
Bricked Dataflow Pipeline  (created 2021-09-06)
https://issues.apache.org/jira/browse/BEAM-12818: When writing to GCS, 
spread prefix of temporary files and reuse autoscaling of the temporary 
directory (created 2021-08-30)
https://issues.apache.org/jira/browse/BEAM-12807: Java creates an incorrect 
pipeline proto when core-construction-java jar is not in the CLASSPATH (created 
2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12792: Beam worker only installs 
--extra_package once (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12632: ElasticsearchIO: Enabling 
both User/Pass auth and SSL overwrites User/Pass (created 2021-07-16)
https://issues.apache.org/jira/browse/BEAM-12540: 
beam_PostRelease_NightlySnapshot - Task 
:runners:direct-java:runMobileGamingJavaDirect FAILED (created 2021-06-25)
https://issues.apache.org/jira/browse/BEAM-12525: SDF BoundedSource seems 
to execute significantly slower than 'normal' BoundedSource (created 2021-06-22)
https://issues.apache.org/jira/browse/BEAM-12505: codecov/patch has poor 
behavior (created 2021-06-17)
https://issues.apache.org/jira/browse/BEAM-12500: Dataflow SocketException 
(SSLException) error while trying to send message from Cloud Pub/Sub to 
BigQuery (created 2021-06-16)
https://issues.apache.org/jira/browse/BEAM-12484: JdbcIO date conversion is 
sensitive to OS (created 2021-06-14)
https://issues.apache.org/jira/browse/BEAM-12467: 
java.io.InvalidClassException With Flink Kafka (created 2021-06-09)
https://issues.apache.org/jira/browse/BEAM-12279: Implement 
destination-dependent sharding in FileIO.writeDynamic (created 2021-05-04)
https://issues.apache.org/jira/browse/BEAM-12256: 
PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some 
Avro logical types (created 2021-04-29)
https://issues.apache.org/jira/browse/BEAM-11959: Python Beam SDK Harness 
hangs when installing pip packages (created 2021-03-11)
https://issues.apache.org/jira/browse/BEAM-11906: No trigger early 
repeatedly for session windows (created 2021-03-01)
https://issues.apache.org/jira/browse/BEAM-11875: XmlIO.Read does not 
handle XML encoding per spec (created 2021-02-26)
https://issues.apache.org/jira/browse/BEAM-11828: JmsIO is not 
acknowledging messages correctly (created 2021-02-17)
https://issues.apache.org/jira/browse/BEAM-11755: Cross-language 
consistency (RequiresStableInputs) is quietly broken (at least on portable 
flink runner) (created 2021-02-05)
https://issues.apache.org/jira/browse/BEAM-11578: `dataflow_metrics` 
(python) fails with TypeError (when int overflowing?) (created 2021-01-06)
https://issues.apache.org/jira/browse/BEAM-11148: Kafka 
commitOffsetsInFinalize OOM on Flink (created 2020-10-28)
https://issues.apache.org/jira/browse/BEAM-11017: Timer with dataflow 
runner can be set multiple times (dataflow runner) (created 2020-10-05)
https://issues.apache.org/jira/browse/BEAM-10670: Make non-portable 
Splittable DoFn the only option when executing Java "Read" transforms (created 
2020-08-10)