Beam Website Feedback

2021-09-15 Thread Vivek Rao
These docs are really useful if I’m playing around with primitive data types. 
There’s no examples of how to transform the data I get from Snowflake into more 
complicated data types. For instance, if I'm reading an employee table, how do 
I serialize my results into a Person pojo? Ideally I’d like to know how I can 
execute a query and serialize the results into a PCollection. The only 
hint I have is I would have to write a custom coder for this.



Re: Possible bug in GetterBasedSchemaProvider.fromRowFunction

2021-09-15 Thread Cristian Constantinescu
I'll give it another shot and let you know if it works. I could update the
documentation accordingly afterwards.

On Wed, Sep 15, 2021 at 6:08 PM Reuven Lax  wrote:

> Yes you would - that's the rationale for adding support for generic types
> in schema inference.
>
> On Wed, Sep 15, 2021 at 3:06 PM Cristian Constantinescu 
> wrote:
>
>> I think I tried that, but can't remember for sure (I'm like 80% sure,
>> sorry for the uncertainty, I've been trying many things for various
>> problems). And it didn't work. However, if I understand this solution
>> correctly, that means that I would have to create these join classes for
>> every type I want to join. Is that right?
>>
>> On Wed, Sep 15, 2021 at 5:56 PM Reuven Lax  wrote:
>>
>>> Could you actually fill in the generic type for Iterable? e.g.
>>> Iterable lhs; I think without that, the schema won't match.
>>>
>>> On Wed, Sep 15, 2021 at 2:52 PM Cristian Constantinescu <
>>> zei...@gmail.com> wrote:
>>>
 Hi Reuven,

 Thanks for getting back to me.

 To answer your question my initial Joined pojo is:

 @DefaultSchema(JavaFieldSchema.class)
   public class JoinedValue {
 public JoinedKey key;

 public Iterable lhs;
 public Iterable rhs;
   }


 Which is exactly the same as the documentation page, minus the field
 names. This is my concern mainly, following the steps documentation does
 not work when running the pipeline. I'll try to set up a sample project to
 illustrate this if you think it would be helpful.

 On Wed, Sep 15, 2021 at 1:08 PM Reuven Lax  wrote:

>
>
> On Sun, Sep 12, 2021 at 7:01 PM Cristian Constantinescu <
> zei...@gmail.com> wrote:
>
>> Hello everyone,
>>
>> As I'm continuing to remove my usage of Row and replacing it with
>> Pojos, I'm following the documentation for the CoGroup transform [1].
>>
>> As per the documentation, I have created a JoinedKey and a
>> JoinedValue, exactly as the examples given in the documentation except 
>> that
>> the key has propA. B and C.
>>
>> I then execute this code:
>> PCollectionTyple.of("lhs", lhs).and("rhs",
>> rhs).apply(Cogroupby...).apply(Convert.to(Joined.class))
>>
>> And I get this:
>> Exception in thread "main" java.lang.RuntimeException: Cannot convert
>> between types that don't have equivalent schemas. input schema: Fields:
>> Field{name=key, description=, type=ROW> propC STRING> NOT NULL, options={{}}}
>> Field{name=lhs, description=, type=ITERABLE NOT NULL, options={{}}}
>> Field{name=rhs, description=, type=ITERABLE NOT NULL, options={{}}}
>> Encoding positions:
>> {lhs=1, rhs=2, key=0}
>> Options:{{}}UUID: 40cef697-152b-4f3b-9110-e32a921915ca output schema:
>> Fields:
>> Field{name=key, description=, type=ROW> propC STRING> NOT NULL, options={{}}}
>> Field{name=lhs, description=, type=ITERABLE NOT NULL, options={{}}}
>> Field{name=rhs, description=, type=ITERABLE NOT NULL, options={{}}}
>> Encoding positions:
>> {lhs=1, rhs=2, key=0}
>>
>> This is probably because lhs and rhs are Iterable and it's trying to
>> compare the schemas from the CoGroup Rows for lhs and rhs and the 
>> Iterable
>> properties from the Joined pojo. We should update the documentation as it
>> doesn't reflect how the code actually behaves (Unless I missed 
>> something).
>>
>
> I'm not sure I  understand the issue here. What exactly does your
> Joined pojo look like?
>
>>
>> My next step was to try to make the Joined Pojo generic. Like this:
>> @DefaultSchema(JavaFieldSchema.class)
>> public class Joined {
>> public JoinedKey key;
>> public Iterable lhs;
>> public Iterable rhs;
>> }
>>
>
> Unfortunately schema inference doesn't work today with generic
> classes. I believe that it's possible to fix this (e.g. we do support 
> Coder
> inference in such cases), but today this won't work.
>
>
>>
>> And then execute this code:
>>
>> var joinedTypeDescriptor = new TypeDescriptor> MyRhsPojo>>(){};
>>
>>var keyCoder = SchemaCoder.of(keySchema,
>> TypeDescriptor.of(JoinedKey.class), new
>> JavaFieldSchema().toRowFunction(TypeDescriptor.of(JoinedKey.class)), new
>> JavaFieldSchema().fromRowFunction(TypeDescriptor.of(JoinedKey.class)));
>> var valueCoder = SchemaCoder.of(keySchema,
>> joinedTypeDescriptor, new
>> JavaFieldSchema().toRowFunction(joinedTypeDescriptor), new
>> JavaFieldSchema().fromRowFunction(joinedTypeDescriptor));
>> var mixed = PCollectionTyple.of("lhs", lhs).and("rhs", rhs)
>> .apply(Cogroupby...)
>> .apply(Convert.to(joinedTypeDescriptor))
>>
>> But this give me:
>> Exception in thread "main" java.lang.ClassCastExcept

Re: Possible bug in GetterBasedSchemaProvider.fromRowFunction

2021-09-15 Thread Cristian Constantinescu
I think I tried that, but can't remember for sure (I'm like 80% sure, sorry
for the uncertainty, I've been trying many things for various problems).
And it didn't work. However, if I understand this solution correctly, that
means that I would have to create these join classes for every type I want
to join. Is that right?

On Wed, Sep 15, 2021 at 5:56 PM Reuven Lax  wrote:

> Could you actually fill in the generic type for Iterable? e.g.
> Iterable lhs; I think without that, the schema won't match.
>
> On Wed, Sep 15, 2021 at 2:52 PM Cristian Constantinescu 
> wrote:
>
>> Hi Reuven,
>>
>> Thanks for getting back to me.
>>
>> To answer your question my initial Joined pojo is:
>>
>> @DefaultSchema(JavaFieldSchema.class)
>>   public class JoinedValue {
>> public JoinedKey key;
>>
>> public Iterable lhs;
>> public Iterable rhs;
>>   }
>>
>>
>> Which is exactly the same as the documentation page, minus the field
>> names. This is my concern mainly, following the steps documentation does
>> not work when running the pipeline. I'll try to set up a sample project to
>> illustrate this if you think it would be helpful.
>>
>> On Wed, Sep 15, 2021 at 1:08 PM Reuven Lax  wrote:
>>
>>>
>>>
>>> On Sun, Sep 12, 2021 at 7:01 PM Cristian Constantinescu <
>>> zei...@gmail.com> wrote:
>>>
 Hello everyone,

 As I'm continuing to remove my usage of Row and replacing it with
 Pojos, I'm following the documentation for the CoGroup transform [1].

 As per the documentation, I have created a JoinedKey and a JoinedValue,
 exactly as the examples given in the documentation except that the key has
 propA. B and C.

 I then execute this code:
 PCollectionTyple.of("lhs", lhs).and("rhs",
 rhs).apply(Cogroupby...).apply(Convert.to(Joined.class))

 And I get this:
 Exception in thread "main" java.lang.RuntimeException: Cannot convert
 between types that don't have equivalent schemas. input schema: Fields:
 Field{name=key, description=, type=ROW>>> propC STRING> NOT NULL, options={{}}}
 Field{name=lhs, description=, type=ITERABLE NOT NULL, options={{}}}
 Field{name=rhs, description=, type=ITERABLE NOT NULL, options={{}}}
 Encoding positions:
 {lhs=1, rhs=2, key=0}
 Options:{{}}UUID: 40cef697-152b-4f3b-9110-e32a921915ca output schema:
 Fields:
 Field{name=key, description=, type=ROW>>> propC STRING> NOT NULL, options={{}}}
 Field{name=lhs, description=, type=ITERABLE NOT NULL, options={{}}}
 Field{name=rhs, description=, type=ITERABLE NOT NULL, options={{}}}
 Encoding positions:
 {lhs=1, rhs=2, key=0}

 This is probably because lhs and rhs are Iterable and it's trying to
 compare the schemas from the CoGroup Rows for lhs and rhs and the Iterable
 properties from the Joined pojo. We should update the documentation as it
 doesn't reflect how the code actually behaves (Unless I missed something).

>>>
>>> I'm not sure I  understand the issue here. What exactly does your Joined
>>> pojo look like?
>>>

 My next step was to try to make the Joined Pojo generic. Like this:
 @DefaultSchema(JavaFieldSchema.class)
 public class Joined {
 public JoinedKey key;
 public Iterable lhs;
 public Iterable rhs;
 }

>>>
>>> Unfortunately schema inference doesn't work today with generic classes.
>>> I believe that it's possible to fix this (e.g. we do support Coder
>>> inference in such cases), but today this won't work.
>>>
>>>

 And then execute this code:

 var joinedTypeDescriptor = new TypeDescriptor>>> MyRhsPojo>>(){};

var keyCoder = SchemaCoder.of(keySchema,
 TypeDescriptor.of(JoinedKey.class), new
 JavaFieldSchema().toRowFunction(TypeDescriptor.of(JoinedKey.class)), new
 JavaFieldSchema().fromRowFunction(TypeDescriptor.of(JoinedKey.class)));
 var valueCoder = SchemaCoder.of(keySchema,
 joinedTypeDescriptor, new
 JavaFieldSchema().toRowFunction(joinedTypeDescriptor), new
 JavaFieldSchema().fromRowFunction(joinedTypeDescriptor));
 var mixed = PCollectionTyple.of("lhs", lhs).and("rhs", rhs)
 .apply(Cogroupby...)
 .apply(Convert.to(joinedTypeDescriptor))

 But this give me:
 Exception in thread "main" java.lang.ClassCastException: class
 org.apache.beam.vendor.guava.v26_0_jre.com.google.common.reflect.Types$ParameterizedTypeImpl
 cannot be cast to class java.lang.Class
 (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.reflect.Types$ParameterizedTypeImpl
 is in unnamed module of loader 'app'; java.lang.Class is in module
 java.base of loader 'bootstrap')
 at
 org.apache.beam.sdk.schemas.GetterBasedSchemaProvider.fromRowFunction(GetterBasedSchemaProvider.java:105)
 (irrelevant stacktrace omitted for brevity)

 It looks like GetterBasedSchemaProvider.fromRowFunction has an explicit
 cast to

Re: Possible bug in GetterBasedSchemaProvider.fromRowFunction

2021-09-15 Thread Cristian Constantinescu
Hi Reuven,

Thanks for getting back to me.

To answer your question my initial Joined pojo is:

@DefaultSchema(JavaFieldSchema.class)
  public class JoinedValue {
public JoinedKey key;

public Iterable lhs;
public Iterable rhs;
  }


Which is exactly the same as the documentation page, minus the field names.
This is my concern mainly, following the steps documentation does not work
when running the pipeline. I'll try to set up a sample project to
illustrate this if you think it would be helpful.

On Wed, Sep 15, 2021 at 1:08 PM Reuven Lax  wrote:

>
>
> On Sun, Sep 12, 2021 at 7:01 PM Cristian Constantinescu 
> wrote:
>
>> Hello everyone,
>>
>> As I'm continuing to remove my usage of Row and replacing it with Pojos,
>> I'm following the documentation for the CoGroup transform [1].
>>
>> As per the documentation, I have created a JoinedKey and a JoinedValue,
>> exactly as the examples given in the documentation except that the key has
>> propA. B and C.
>>
>> I then execute this code:
>> PCollectionTyple.of("lhs", lhs).and("rhs",
>> rhs).apply(Cogroupby...).apply(Convert.to(Joined.class))
>>
>> And I get this:
>> Exception in thread "main" java.lang.RuntimeException: Cannot convert
>> between types that don't have equivalent schemas. input schema: Fields:
>> Field{name=key, description=, type=ROW> STRING> NOT NULL, options={{}}}
>> Field{name=lhs, description=, type=ITERABLE NOT NULL, options={{}}}
>> Field{name=rhs, description=, type=ITERABLE NOT NULL, options={{}}}
>> Encoding positions:
>> {lhs=1, rhs=2, key=0}
>> Options:{{}}UUID: 40cef697-152b-4f3b-9110-e32a921915ca output schema:
>> Fields:
>> Field{name=key, description=, type=ROW> propC STRING> NOT NULL, options={{}}}
>> Field{name=lhs, description=, type=ITERABLE NOT NULL, options={{}}}
>> Field{name=rhs, description=, type=ITERABLE NOT NULL, options={{}}}
>> Encoding positions:
>> {lhs=1, rhs=2, key=0}
>>
>> This is probably because lhs and rhs are Iterable and it's trying to
>> compare the schemas from the CoGroup Rows for lhs and rhs and the Iterable
>> properties from the Joined pojo. We should update the documentation as it
>> doesn't reflect how the code actually behaves (Unless I missed something).
>>
>
> I'm not sure I  understand the issue here. What exactly does your Joined
> pojo look like?
>
>>
>> My next step was to try to make the Joined Pojo generic. Like this:
>> @DefaultSchema(JavaFieldSchema.class)
>> public class Joined {
>> public JoinedKey key;
>> public Iterable lhs;
>> public Iterable rhs;
>> }
>>
>
> Unfortunately schema inference doesn't work today with generic classes. I
> believe that it's possible to fix this (e.g. we do support Coder inference
> in such cases), but today this won't work.
>
>
>>
>> And then execute this code:
>>
>> var joinedTypeDescriptor = new TypeDescriptor> MyRhsPojo>>(){};
>>
>>var keyCoder = SchemaCoder.of(keySchema,
>> TypeDescriptor.of(JoinedKey.class), new
>> JavaFieldSchema().toRowFunction(TypeDescriptor.of(JoinedKey.class)), new
>> JavaFieldSchema().fromRowFunction(TypeDescriptor.of(JoinedKey.class)));
>> var valueCoder = SchemaCoder.of(keySchema, joinedTypeDescriptor,
>> new JavaFieldSchema().toRowFunction(joinedTypeDescriptor), new
>> JavaFieldSchema().fromRowFunction(joinedTypeDescriptor));
>> var mixed = PCollectionTyple.of("lhs", lhs).and("rhs", rhs)
>> .apply(Cogroupby...)
>> .apply(Convert.to(joinedTypeDescriptor))
>>
>> But this give me:
>> Exception in thread "main" java.lang.ClassCastException: class
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.reflect.Types$ParameterizedTypeImpl
>> cannot be cast to class java.lang.Class
>> (org.apache.beam.vendor.guava.v26_0_jre.com.google.common.reflect.Types$ParameterizedTypeImpl
>> is in unnamed module of loader 'app'; java.lang.Class is in module
>> java.base of loader 'bootstrap')
>> at
>> org.apache.beam.sdk.schemas.GetterBasedSchemaProvider.fromRowFunction(GetterBasedSchemaProvider.java:105)
>> (irrelevant stacktrace omitted for brevity)
>>
>> It looks like GetterBasedSchemaProvider.fromRowFunction has an explicit
>> cast to "Class" but there could be instances where a guava type is passed
>> in.
>>
>> So my workaround for now, as elegant as a roadkill, is to do things
>> manually as below. (actual class names replaced with RhsPojo)
>>
>>  var mixed = PCollectionTyple.of("lhs", lhs).and("rhs", rhs)
>> .apply(Cogroupby...)
>> //.apply(Convert.to(joinedTypeDescriptor))
>> .apply(MapElements.into(joinedTypeDescriptor).via(x -> {
>> var key = new
>> JavaFieldSchema().fromRowFunction(TypeDescriptor.of(JoinedKey.class)).apply(x.getRow("key"));
>>
>> var lhsList = new ArrayList();
>> var rowJoinedSerializableFunction = new
>> JavaFieldSchema().fromRowFunction(lhsTypeDescriptor);
>> for (var item : x.getIterable("lhs")) {
>> 

Flaky test issue report (35)

2021-09-15 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-12882: 
pubsublite.SubscriptionPartitionLoaderTest.addedResults flaky (created 
2021-09-14)
https://issues.apache.org/jira/browse/BEAM-12861: 
apache_beam.ml.gcp.recommendations_ai_test_it.RecommendationAIIT.test_create_catalog_item
  is flaky (created 2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12860: 
apache_beam.examples.dataframe.taxiride_it_test.TaxirideIT.test_aggregation  is 
flaky (created 2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12859: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12858: 
org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler 
is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12842: 
StreamingDataflowWorkerTest.testHotKeyLogging test flake (created 2021-09-03)
https://issues.apache.org/jira/browse/BEAM-12809: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12694: DICOMIoIntegrationTest 
flaky due to store ID (Python PreCommit) (created 2021-07-30)
https://issues.apache.org/jira/browse/BEAM-12603: Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
 (created 2021-07-12)
https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: 
Failed to read inputs in the data plane (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12200: 
SamzaStoreStateInternalsTest is flaky (created 2021-04-20)
https://issues.apache.org/jira/browse/BEAM-12163: Python GHA PreCommits 
flake with grpc.FutureTimeoutError on SDK harness startup (created 2021-04-13)
https://issues.apache.org/jira/browse/BEAM-12061: beam_PostCommit_SQL 
failing on KafkaTableProviderIT.testFakeNested (created 2021-03-27)
https://issues.apache.org/jira/browse/BEAM-11837: Java build flakes: 
"Memory constraints are impeding performance" (created 2021-02-18)
https://issues.apache.org/jira/browse/BEAM-11792: Python precommit failed 
(flaked?) installing package  (created 2021-02-10)
https://issues.apache.org/jira/browse/BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (created 2021-01-20)
https://issues.apache.org/jira/browse/BEAM-11661: hdfsIntegrationTest 
flake: network not found (py38 postcommit) (created 2021-01-19)
https://issues.apache.org/jira/browse/BEAM-11645: beam_PostCommit_XVR_Flink 
failing (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11641: Bigquery Read tests are 
flaky on Flink runner in Python PostCommit suites (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11541: 
testTeardownCalledAfterExceptionInProcessElement flakes on direct runner. 
(created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-10955: Flink Java Runner test 
flake: Could not find Flink job (FlinkJobNotFoundException) (created 2020-09-23)
https://issues.apache.org/jira/browse/BEAM-10866: 
PortableRunnerTestWithSubprocesses.test_register_finalizations flaky on macOS 
(created 2020-09-09)
https://issues.apache.org/jira/browse/BEAM-10485: Failure / flake: 
ElasticsearchIOTest > testWriteWithIndexFn (created 2020-07-14)
https://issues.apache.org/jira/browse/BEAM-9649: 
beam_python_mongoio_load_test started failing due to mismatched results 
(created 2020-03-31)
https://issues.apache.org/jira/browse/BEAM-8453: Failure in 
org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety (created 
2019-10-21)
https://issues.apache.org/jira/browse/BEAM-8101: Flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful for 
Direct, Spark, Flink (created 2019-08-27)
https://issues.apache.org/jira/browse/BEAM-8035: 
WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in timestamp 
order (sickbayed) (created 2019-08-22)
https://issues.apache.org/jira/browse/BEAM-7827: 
MetricsTest$AttemptedMetricT

P1 issues report (44)

2021-09-15 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-12867: Either Create or 
DirectRunner fails to produce all elements to the following transform (created 
2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12843: (Broken Pipe induced) 
Bricked Dataflow Pipeline  (created 2021-09-06)
https://issues.apache.org/jira/browse/BEAM-12794: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12792: Beam worker only installs 
--extra_package once (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12781: SDFBoundedSourceReader 
behaves much slower compared with the original behavior of BoundedSource 
(created 2021-08-20)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12632: ElasticsearchIO: Enabling 
both User/Pass auth and SSL overwrites User/Pass (created 2021-07-16)
https://issues.apache.org/jira/browse/BEAM-12628: AvroCoder changed 
underlying String class for SpecificRecords (created 2021-07-16)
https://issues.apache.org/jira/browse/BEAM-12607: Copy Code Snippet copies 
html tags (created 2021-07-13)
https://issues.apache.org/jira/browse/BEAM-12540: 
beam_PostRelease_NightlySnapshot - Task 
:runners:direct-java:runMobileGamingJavaDirect FAILED (created 2021-06-25)
https://issues.apache.org/jira/browse/BEAM-12525: SDF BoundedSource seems 
to execute significantly slower than 'normal' BoundedSource (created 2021-06-22)
https://issues.apache.org/jira/browse/BEAM-12505: codecov/patch has poor 
behavior (created 2021-06-17)
https://issues.apache.org/jira/browse/BEAM-12500: Dataflow SocketException 
(SSLException) error while trying to send message from Cloud Pub/Sub to 
BigQuery (created 2021-06-16)
https://issues.apache.org/jira/browse/BEAM-12484: JdbcIO date conversion is 
sensitive to OS (created 2021-06-14)
https://issues.apache.org/jira/browse/BEAM-12467: 
java.io.InvalidClassException With Flink Kafka (created 2021-06-09)
https://issues.apache.org/jira/browse/BEAM-12380: Go SDK Kafka IO Transform 
implemented via XLang (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12310: 
beam_PostCommit_Java_DataflowV2 failing (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12279: Implement 
destination-dependent sharding in FileIO.writeDynamic (created 2021-05-04)
https://issues.apache.org/jira/browse/BEAM-12256: 
PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some 
Avro logical types (created 2021-04-29)
https://issues.apache.org/jira/browse/BEAM-11959: Python Beam SDK Harness 
hangs when installing pip packages (created 2021-03-11)
https://issues.apache.org/jira/browse/BEAM-11906: No trigger early 
repeatedly for session windows (created 2021-03-01)
https://issues.apache.org/jira/browse/BEAM-11875: XmlIO.Read does not 
handle XML encoding per spec (created 2021-02-26)
https://issues.apache.org/jira/browse/BEAM-11828: JmsIO is not 
acknowledging messages correctly (created 2021-02-17)
https://issues.apache.org/jira/browse/BEAM-11755: Cross-language 
consistency (RequiresStableInputs) is quietly broken (at least on portable 
flink runner) (created 2021-02-05)
https://issues.apache.org/jira/browse/BEAM-11578: `dataflow_metrics` 
(python) fails with TypeError (when int overflowing?) (created 2021-01-06)
https://issues.apache.org/jira/browse/BEAM-11148: Kafka 
commitOffsetsInFinalize OOM on Flink (created 2020-10-28)
https://issues.apache.org/jira/browse/BEAM-11017: Timer with dataflow 
runner can be set multiple times (dataflow runner) (created 2020-10-05)
https://issues.apache.org/jira/browse/BEAM-10670: Make non-portable 
Splittable DoFn the only option when executing Java "Read" transforms (created 
2020-08-10)
https://issues.apache.org/jira/browse/BEAM-10617: python 
CombineGlobally().with_fanout() cause duplicate combine results for sliding 
windows (created 2020-07-31)
https://issues.apache.org/jira/browse/BEAM-10569: SpannerIO tests don't 
actually assert anything. (created 2020-07-23)
https://issues.apache.org/jira/browse/BEAM-10529: Kafka XLang fails for 
?empty? key/values (created 2020-07-18)
https://issues.apache.org/jira/browse/BEAM-10288: Quickstart documents are 
out of date (created 2020-06-19)
https://issues.apache.org/jira/browse/BEAM-10244: Populate requirements 
cache fails on poetry-based packages (created 2020-06-11)

Re: Percentile metrics in Beam

2021-09-15 Thread Ajo Thomas
Thanks for the response, Alexey and Ke.
Agree with your point to introduce a new metric type (say Percentiles)
instead of altering the Distribution metric type to ensure compatibility
across runners and sdks.
I am currently working on a prototype to add this new metric type to the
metrics API and testing it with samza runner. I can share a design doc with
the community with possible solutions very soon.

Thanks
Ajo

On Wed, Sep 15, 2021 at 9:26 AM Alexey Romanenko 
wrote:

> I agree with Ke Wu in the way that we need to keep compatibility across
> all runners and the same metrics. So, it seems that it would be better to
> create another metric type in this case.
>
> Also, to discuss it in details, I’d recommend to create a design document
> with possible solutions and examples.
>
> —
> Alexey
>
> On 14 Sep 2021, at 19:04, Ke Wu  wrote:
>
> I prefer adding a new metrics type instead of enhancing the existing
> Distribution [1] to support percentiles etc in order to ensure better
> compatibility.
>
> @Luke @Kyle what are your thoughts on this?
>
> Best,
> Ke
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/Distribution.java
>
>
> On Sep 7, 2021, at 1:28 PM, Ajo Thomas  wrote:
>
> Hi All,
>
> I am working on adding support for some additional distribution metrics
> like std dev, percentiles to the Metrics API. The runner of interest here
> is Samza runner. I wanted to get the opinion of fellow beam devs on this.
>
> One way to do this would be to make changes to the existing Distribution
> metric:
> - Add additional metrics to Distribution metric- custom percentiles, std
> dev, mean. Use Dropwizard Histogram under the hood in DistributionData to
> track the distribution of the data.
> - This also means changes to accompanying classes like DistributionData,
> DistributionResult which might involve runner specific changes.
>
> Is this an acceptable change or would you suggest something else? Is the
> Distribution metric only intended to track the metrics that it is currently
> tracking- sum, min, max, count?
>
> Thanks
> Ajo
>
>
>
>


Re: Percentile metrics in Beam

2021-09-15 Thread Alexey Romanenko
I agree with Ke Wu in the way that we need to keep compatibility across all 
runners and the same metrics. So, it seems that it would be better to create 
another metric type in this case.

Also, to discuss it in details, I’d recommend to create a design document with 
possible solutions and examples.

—
Alexey

> On 14 Sep 2021, at 19:04, Ke Wu  wrote:
> 
> I prefer adding a new metrics type instead of enhancing the existing 
> Distribution [1] to support percentiles etc in order to ensure better 
> compatibility. 
> 
> @Luke @Kyle what are your thoughts on this? 
> 
> Best,
> Ke
> 
> [1] 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/Distribution.java
>  
> 
>  
> 
>> On Sep 7, 2021, at 1:28 PM, Ajo Thomas > > wrote:
>> 
>> Hi All,
>> 
>> I am working on adding support for some additional distribution metrics like 
>> std dev, percentiles to the Metrics API. The runner of interest here is 
>> Samza runner. I wanted to get the opinion of fellow beam devs on this.
>> 
>> One way to do this would be to make changes to the existing Distribution 
>> metric:
>> - Add additional metrics to Distribution metric- custom percentiles, std 
>> dev, mean. Use Dropwizard Histogram under the hood in DistributionData to 
>> track the distribution of the data.
>> - This also means changes to accompanying classes like DistributionData, 
>> DistributionResult which might involve runner specific changes.
>> 
>> Is this an acceptable change or would you suggest something else? Is the 
>> Distribution metric only intended to track the metrics that it is currently 
>> tracking- sum, min, max, count?
>> 
>> Thanks
>> Ajo
>> 
>