Re: Empty projects defined in settings.xml

2019-02-10 Thread Kenneth Knowles
I think :beam-runners-gcp-gcsproxy would be an implementation of the
artifact API [1] on top of GCS. Something fitting that description does
exist [2].

Since settings.gradle renames all our folders, basically, I think each line
deserves some commentary. And seems like a smoke test in the root project
might catch when things get stale.

Kenn

[1]
https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_artifact_api.proto
[2]
https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/artifact/gcsproxy

On Sun, Feb 10, 2019 at 4:23 PM Michael Luckey  wrote:

> Hi,
>
> while looking into settings.gradle I stumbled upon 2 project definitions
> [1],
>
> - beam-runners-gcp-gcemd
> - beam-runners-gcp-gcsproxy
>
> which I can not make any sense of. Any insights, why these exists or what
> they are supposed to be?
>
> They were added with HadoopFormatIO.Write [2].
>
> Thanks,
>
> michel
>
> [1] https://github.com/apache/beam/blob/master/settings.gradle#L64-L67
> [2]
> https://github.com/apache/beam/commit/757b71e749ab8a9f0a08e3669596ce69920acbac
>


Re: Schemas for classes with type parameters

2019-02-10 Thread Reuven Lax
Sounds reasonable. I'm still not sure that we will get good TypeDescriptors
for nested types and generic container fields, but we might at least be
able to make the simple generic-type cases work.

On Sun, Feb 10, 2019 at 8:34 PM Kenneth Knowles  wrote:

> Registration/definition and use are different sites. The TypeDescriptor
> always comes from the user fn or the transform. For Jeff's example, the
> AutoValueSchema provider is registered with MyClass.class which is fine.
> Then when a user writes a DoFn that accepts or returns a
> MyClass first you infer a schema ActualSchema for ActualType
> and then you pass it along and the invocation is along the lines of
> AutoValueSchemeProvider(MyClass.class, ImmutableList.of(ActualSchema)) and
> you'd get a legitimate schema for MyClass.
>
> I expect this would be a decent amount of work in the schema machinery and
> also the AutoValueSchemaProvider would need to be type-variable aware.
>
> Kenn
>
> On Sun, Feb 10, 2019 at 8:24 PM Reuven Lax  wrote:
>
>> Ok, so actually SchemaRegistry is based on TypeDescriptors, so it does
>> not have this limitation (I was wrong about that).
>>
>> However, I'm still not sure that the @DefaultSchema annotation-based
>> registration would work here. Right now it tries to infer a schema eagerly,
>> which clearly would not work. I guess we could create a SchemaProvider that
>> lazily resolved the schema only upon use, when we should have a good
>> TypeDescriptor.. However I'm still worried that we often won't have a good
>> type descriptor. It works well for DoFn, because usually the user's DoFn is
>> a concrete class with resolved types. I'm not sure that this is easy to do
>> with AutoValue; the user can't create a concrete subclass of their
>> AutoValue class, as that won't work with the generated code AutoValue does.
>>
>> Reuven
>>
>> On Sun, Feb 10, 2019 at 8:00 PM Kenneth Knowles  wrote:
>>
>>> Hmm, this is a huge limitation relative to the CoderRegistry, which very
>>> explicitly does support constructing parameterized coders via
>>> CoderProvider. The root CoderProvider is still keyed on rawtype but the
>>> CoderProvider is passed inferred coders for the concrete parameters. Here's
>>> how List.class is registered:
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java#L116
>>>
>>> The one thing that _is_ required for this is that at the call site a
>>> good TypeDescriptor is captured. That is mostly automatic for DoFns, hence
>>> the CoderRegistry works fairly well. There are special methods in various
>>> user fns and boilerplate in transforms like MapElements to provide a good
>>> TypeDescriptor.
>>>
>>> Kenn
>>>
>>> On Sun, Feb 10, 2019 at 5:11 PM Reuven Lax  wrote:
>>>
 This is an interesting question.

 In general, I don't think schema inference can handle these generics
 today. Right now the SchemaRegistry is keyed off of Java class, and due to
 type erasure all different instances of . MyClass will look the same.

 Now it might be possible to include generic type parameters in the
 registry. You would not be able to use the @DefaultSchema annotation to
 infer a schema, but you might be able to dynamically register a schema
 using a TypeDescriptor. Unfortunately I think this would only sometimes
 work. e..g. my experience has been that given a type T you can often figure
 out T using reflection, but if there are nested types (e.g. List) than
 Java doesn't always preserve these types for introspection..

 In sum, I think we could do a bit better for these types of classes,
 but not a whole lot better.

 Reuven

 On Mon, Feb 4, 2019 at 6:02 AM Jeff Klukas  wrote:

> I've started experimenting with Beam schemas in the context of
> creating custom AutoValue-based classes and using AutoValueSchema to
> generate schemas and thus coders.
>
> AFAICT, schemas need to have types fully specified, so it doesn't
> appear to be possible to define an AutoValue class with a type parameter
> and then create a schema for it. Basically, I want to confirm whether the
> following type would ever be possible to create a schema for:
>
> @DefaultSchema(AutoValueSchema.class)
> @AutoValue
> public abstract class MyClass {
>   public abstract T getField1();
>   public abstract String getField2();
>   public static  MyClass of(T field1, String field2) {
> return new AutoValue_MyClass(field1, field2);
>   }
> }
>
> This may be an entirely reasonable limitation of the schema machinery,
> but I want to make sure I'm not missing something.
>



Re: Schemas for classes with type parameters

2019-02-10 Thread Kenneth Knowles
Registration/definition and use are different sites. The TypeDescriptor
always comes from the user fn or the transform. For Jeff's example, the
AutoValueSchema provider is registered with MyClass.class which is fine.
Then when a user writes a DoFn that accepts or returns a
MyClass first you infer a schema ActualSchema for ActualType
and then you pass it along and the invocation is along the lines of
AutoValueSchemeProvider(MyClass.class, ImmutableList.of(ActualSchema)) and
you'd get a legitimate schema for MyClass.

I expect this would be a decent amount of work in the schema machinery and
also the AutoValueSchemaProvider would need to be type-variable aware.

Kenn

On Sun, Feb 10, 2019 at 8:24 PM Reuven Lax  wrote:

> Ok, so actually SchemaRegistry is based on TypeDescriptors, so it does not
> have this limitation (I was wrong about that).
>
> However, I'm still not sure that the @DefaultSchema annotation-based
> registration would work here. Right now it tries to infer a schema eagerly,
> which clearly would not work. I guess we could create a SchemaProvider that
> lazily resolved the schema only upon use, when we should have a good
> TypeDescriptor.. However I'm still worried that we often won't have a good
> type descriptor. It works well for DoFn, because usually the user's DoFn is
> a concrete class with resolved types. I'm not sure that this is easy to do
> with AutoValue; the user can't create a concrete subclass of their
> AutoValue class, as that won't work with the generated code AutoValue does.
>
> Reuven
>
> On Sun, Feb 10, 2019 at 8:00 PM Kenneth Knowles  wrote:
>
>> Hmm, this is a huge limitation relative to the CoderRegistry, which very
>> explicitly does support constructing parameterized coders via
>> CoderProvider. The root CoderProvider is still keyed on rawtype but the
>> CoderProvider is passed inferred coders for the concrete parameters. Here's
>> how List.class is registered:
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java#L116
>>
>> The one thing that _is_ required for this is that at the call site a good
>> TypeDescriptor is captured. That is mostly automatic for DoFns, hence the
>> CoderRegistry works fairly well. There are special methods in various user
>> fns and boilerplate in transforms like MapElements to provide a good
>> TypeDescriptor.
>>
>> Kenn
>>
>> On Sun, Feb 10, 2019 at 5:11 PM Reuven Lax  wrote:
>>
>>> This is an interesting question.
>>>
>>> In general, I don't think schema inference can handle these generics
>>> today. Right now the SchemaRegistry is keyed off of Java class, and due to
>>> type erasure all different instances of . MyClass will look the same.
>>>
>>> Now it might be possible to include generic type parameters in the
>>> registry. You would not be able to use the @DefaultSchema annotation to
>>> infer a schema, but you might be able to dynamically register a schema
>>> using a TypeDescriptor. Unfortunately I think this would only sometimes
>>> work. e..g. my experience has been that given a type T you can often figure
>>> out T using reflection, but if there are nested types (e.g. List) than
>>> Java doesn't always preserve these types for introspection..
>>>
>>> In sum, I think we could do a bit better for these types of classes, but
>>> not a whole lot better.
>>>
>>> Reuven
>>>
>>> On Mon, Feb 4, 2019 at 6:02 AM Jeff Klukas  wrote:
>>>
 I've started experimenting with Beam schemas in the context of creating
 custom AutoValue-based classes and using AutoValueSchema to generate
 schemas and thus coders.

 AFAICT, schemas need to have types fully specified, so it doesn't
 appear to be possible to define an AutoValue class with a type parameter
 and then create a schema for it. Basically, I want to confirm whether the
 following type would ever be possible to create a schema for:

 @DefaultSchema(AutoValueSchema.class)
 @AutoValue
 public abstract class MyClass {
   public abstract T getField1();
   public abstract String getField2();
   public static  MyClass of(T field1, String field2) {
 return new AutoValue_MyClass(field1, field2);
   }
 }

 This may be an entirely reasonable limitation of the schema machinery,
 but I want to make sure I'm not missing something.

>>>


Re: Schemas for classes with type parameters

2019-02-10 Thread Reuven Lax
Ok, so actually SchemaRegistry is based on TypeDescriptors, so it does not
have this limitation (I was wrong about that).

However, I'm still not sure that the @DefaultSchema annotation-based
registration would work here. Right now it tries to infer a schema eagerly,
which clearly would not work. I guess we could create a SchemaProvider that
lazily resolved the schema only upon use, when we should have a good
TypeDescriptor.. However I'm still worried that we often won't have a good
type descriptor. It works well for DoFn, because usually the user's DoFn is
a concrete class with resolved types. I'm not sure that this is easy to do
with AutoValue; the user can't create a concrete subclass of their
AutoValue class, as that won't work with the generated code AutoValue does.

Reuven

On Sun, Feb 10, 2019 at 8:00 PM Kenneth Knowles  wrote:

> Hmm, this is a huge limitation relative to the CoderRegistry, which very
> explicitly does support constructing parameterized coders via
> CoderProvider. The root CoderProvider is still keyed on rawtype but the
> CoderProvider is passed inferred coders for the concrete parameters. Here's
> how List.class is registered:
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java#L116
>
> The one thing that _is_ required for this is that at the call site a good
> TypeDescriptor is captured. That is mostly automatic for DoFns, hence the
> CoderRegistry works fairly well. There are special methods in various user
> fns and boilerplate in transforms like MapElements to provide a good
> TypeDescriptor.
>
> Kenn
>
> On Sun, Feb 10, 2019 at 5:11 PM Reuven Lax  wrote:
>
>> This is an interesting question.
>>
>> In general, I don't think schema inference can handle these generics
>> today. Right now the SchemaRegistry is keyed off of Java class, and due to
>> type erasure all different instances of . MyClass will look the same.
>>
>> Now it might be possible to include generic type parameters in the
>> registry. You would not be able to use the @DefaultSchema annotation to
>> infer a schema, but you might be able to dynamically register a schema
>> using a TypeDescriptor. Unfortunately I think this would only sometimes
>> work. e..g. my experience has been that given a type T you can often figure
>> out T using reflection, but if there are nested types (e.g. List) than
>> Java doesn't always preserve these types for introspection..
>>
>> In sum, I think we could do a bit better for these types of classes, but
>> not a whole lot better.
>>
>> Reuven
>>
>> On Mon, Feb 4, 2019 at 6:02 AM Jeff Klukas  wrote:
>>
>>> I've started experimenting with Beam schemas in the context of creating
>>> custom AutoValue-based classes and using AutoValueSchema to generate
>>> schemas and thus coders.
>>>
>>> AFAICT, schemas need to have types fully specified, so it doesn't appear
>>> to be possible to define an AutoValue class with a type parameter and then
>>> create a schema for it. Basically, I want to confirm whether the following
>>> type would ever be possible to create a schema for:
>>>
>>> @DefaultSchema(AutoValueSchema.class)
>>> @AutoValue
>>> public abstract class MyClass {
>>>   public abstract T getField1();
>>>   public abstract String getField2();
>>>   public static  MyClass of(T field1, String field2) {
>>> return new AutoValue_MyClass(field1, field2);
>>>   }
>>> }
>>>
>>> This may be an entirely reasonable limitation of the schema machinery,
>>> but I want to make sure I'm not missing something.
>>>
>>


Re: Adding kotlin samples to apache beam

2019-02-10 Thread Kenneth Knowles
I commented on the Jira but I want to also say it here: Cool! It is great
that the Java SDK "just works" with another JVM-based language. I think
this would be a great addition.

Kenn

On Sun, Feb 10, 2019 at 8:10 PM hars...@pitech.app 
wrote:

> I have been using Apache Beam for few of my projects in production since
> the past 6 months and apart from Java, Kotlin also seems to work as well
> with no issues whatsoever.
>
> But currently, the Github Repository of Apache Beam contains examples only
> in Java which might be an issue for other developers who want to use Apache
> Beam SDK with kotlin as there are no sample resources available.
>
> That said, I would love to go ahead and add kotlin examples alongside the
> current java examples in the Beam repository.
>
> Please let me know if that sounds reasonable.
>
> Best.
>


Adding kotlin samples to apache beam

2019-02-10 Thread harshit
I have been using Apache Beam for few of my projects in production since the 
past 6 months and apart from Java, Kotlin also seems to work as well with no 
issues whatsoever.

But currently, the Github Repository of Apache Beam contains examples only in 
Java which might be an issue for other developers who want to use Apache Beam 
SDK with kotlin as there are no sample resources available.

That said, I would love to go ahead and add kotlin examples alongside the 
current java examples in the Beam repository.

Please let me know if that sounds reasonable.

Best.


Re: Schemas for classes with type parameters

2019-02-10 Thread Kenneth Knowles
Hmm, this is a huge limitation relative to the CoderRegistry, which very
explicitly does support constructing parameterized coders via
CoderProvider. The root CoderProvider is still keyed on rawtype but the
CoderProvider is passed inferred coders for the concrete parameters. Here's
how List.class is registered:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java#L116

The one thing that _is_ required for this is that at the call site a good
TypeDescriptor is captured. That is mostly automatic for DoFns, hence the
CoderRegistry works fairly well. There are special methods in various user
fns and boilerplate in transforms like MapElements to provide a good
TypeDescriptor.

Kenn

On Sun, Feb 10, 2019 at 5:11 PM Reuven Lax  wrote:

> This is an interesting question.
>
> In general, I don't think schema inference can handle these generics
> today. Right now the SchemaRegistry is keyed off of Java class, and due to
> type erasure all different instances of . MyClass will look the same.
>
> Now it might be possible to include generic type parameters in the
> registry. You would not be able to use the @DefaultSchema annotation to
> infer a schema, but you might be able to dynamically register a schema
> using a TypeDescriptor. Unfortunately I think this would only sometimes
> work. e..g. my experience has been that given a type T you can often figure
> out T using reflection, but if there are nested types (e.g. List) than
> Java doesn't always preserve these types for introspection..
>
> In sum, I think we could do a bit better for these types of classes, but
> not a whole lot better.
>
> Reuven
>
> On Mon, Feb 4, 2019 at 6:02 AM Jeff Klukas  wrote:
>
>> I've started experimenting with Beam schemas in the context of creating
>> custom AutoValue-based classes and using AutoValueSchema to generate
>> schemas and thus coders.
>>
>> AFAICT, schemas need to have types fully specified, so it doesn't appear
>> to be possible to define an AutoValue class with a type parameter and then
>> create a schema for it. Basically, I want to confirm whether the following
>> type would ever be possible to create a schema for:
>>
>> @DefaultSchema(AutoValueSchema.class)
>> @AutoValue
>> public abstract class MyClass {
>>   public abstract T getField1();
>>   public abstract String getField2();
>>   public static  MyClass of(T field1, String field2) {
>> return new AutoValue_MyClass(field1, field2);
>>   }
>> }
>>
>> This may be an entirely reasonable limitation of the schema machinery,
>> but I want to make sure I'm not missing something.
>>
>


Re: JIRA priorities explaination

2019-02-10 Thread Kenneth Knowles
The content that Alex posted* is the definition from our Jira installation
anyhow.

I just searched around, and there's
https://community.atlassian.com/t5/Jira-questions/According-to-Jira-What-is-Blocker-Critical-Major-Minor-and/qaq-p/668774
which makes clear that this is really user-defined, since Jira has many
deployments with their own configs.

I guess what I want to know about this thread is what action is being
proposed?

Previously, there was a thread that resulted in
https://beam.apache.org/contribute/precommit-policies/ and
https://beam.apache.org/contribute/postcommits-policies/. These have test
failures and flakes as Critical. I agree with Alex that these should be
Blocker. They disrupt the work of the entire community, so we need to drop
everything and get green again.

Other than that, I think what you - Daniel - are suggesting is that the
definition might be best expressed as SLOs. I asked on u...@infra.apache.org
about how we could have those and the answer is the homebrew
https://svn.apache.org/repos/infra/infrastructure/trunk/projects/status/sla/jira/.
If anyone has time to dig into that and see if it can work for us, that
would be cool.

Kenn

*Blocker: Blocks development and/or testing work, production could not run
Critical: Crashes, loss of data, severe memory leak.
Major (Default): Major loss of function.
Minor: Minor loss of function, or other problem where easy workaround is
present.
Trivial: Trivial Cosmetic problem like misspelt words or misaligned text.


On Sun, Feb 10, 2019 at 7:20 PM Daniel Oliveira 
wrote:

> Are there existing meanings for the priorities in Jira already? I wasn't
> able to find any info on either the Beam website or wiki about it, so I've
> just been prioritizing issues based on gut feeling. If not, I think having
> some well-defined priorities would be nice, at least for our test-failures,
> and especially if we wanna have some SLOs like I've seen being thrown about.
>
> On Fri, Feb 8, 2019 at 3:06 PM Kenneth Knowles  wrote:
>
>> I've been thinking about this since working on the release. If I ignore
>> the names I think:
>>
>> P0: get paged, stop whatever you planned on doing, work late to fix
>> P1: continually update everyone on status and shouldn't sit around
>> unassigned
>> P2: most things here; they can be planned or picked up by whomever
>> P3: nice-to-have things, maybe starter tasks or lesser cleanup, but no
>> driving need
>> Sometimes there's P4 but I don't value it. Often P3 is a deprioritized
>> thing from P2, so more involved and complex, while P4 is something easy and
>> not important filed just as a reminder. Either way, they are both not on
>> the main path of work.
>>
>> I looked into it and the Jira priority scheme determines the set of
>> priorities as well as the default. Ours is shared by 635 projects. Probably
>> worth keeping. The default priority is Major which would correspond with
>> P2. We can expect the default to be where most issues end up.
>>
>> P0 == Blocker: get paged, stop whatever you planned on doing, work late
>> to fix
>> P1 == Critical: continually update everyone on status and shouldn't sit
>> around unassigned
>> P0 == Major (default): most things here; they can be planned or picked up
>> by whomever
>> P3 == Minor: nice-to-have things, maybe starter tasks or lesser cleanup,
>> but no driving need
>> Trivial: Maybe this is attractive to newcomers as it makes it sound easy.
>>
>> Kenn
>>
>> On Thu, Feb 7, 2019 at 4:08 PM Alex Amato  wrote:
>>
>>> Hello Beam community, I was thinking about this and found some
>>> information to share/discuss. Would it be possible to confirm my thinking
>>> on this:
>>>
>>>- There are 5 priorities in the JIRA system today (tooltip link
>>>
>>> 
>>>):
>>>-
>>>   - *Blocker* Blocks development and/or testing work, production
>>>   could not run
>>>   - *Critical* Crashes, loss of data, severe memory leak.
>>>   - *Major* Major loss of function.
>>>   - *Minor* Minor loss of function, or other problem where easy
>>>   workaround is present.
>>>   - *Trivial* Cosmetic problem like misspelt words or misaligned
>>>   text.
>>>- How should JIRA issues be prioritized for pre/post commit test
>>>failures?
>>>   - I think *Blocker*
>>>- What about the flakey failures?
>>>   - *Blocker* as well?
>>>- How should non test issues be prioritized? (E.g. feature to
>>>implement or bugs not regularly breaking tests).
>>>   - I suggest *Minor*, but its not clear how to distinguish between
>>>   these.
>>>
>>> Below is my thinking: But I wanted to know what the Apache/Beam
>>> community generally thinks about these priorities.
>>>
>>>- *Blocker*: Expect to be paged. Production systems are down.
>>>- *Critical*: Expect to be contacted by email or a bot to fix this.
>>>- *Major*: Some loss of function in the repository, 

Re: Is it possible to gracefully close GrpcDataService? [was Re: [BEAM-6594] Flakey GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients - failing in precommit]

2019-02-10 Thread Daniel Oliveira
This is something I've run into while working on the reference runner and
it's bugged me too. I've tried looking into what the issue was but usually
hit dead ends. Your post is really helpful, I might use it to take another
look when I have the time.

On Fri, Feb 8, 2019 at 5:26 PM Alex Amato  wrote:

> I think graceful shutdown has been historically overlooked, it would not
> surprise me if there are a few things accidentally left out to gracefully
> shutdown the runner harness/sdk.
>
> IIRC there was also some discussion around starting up incorrectly as well
> (requiring a certain order of SDK process startup and runner harness
> startup, which may have had races as well.)
>
> On Fri, Feb 8, 2019 at 4:49 PM Brian Hulette  wrote:
>
>> I think I've finally got a handle on this flake, and a possible solution
>> [1]. One thing that's still bothering me though is that the "CANCELLED:
>> Multiplexer hanging up" errors seem to be unavoidable.
>>
>> They occur when the GrpcDataService is closed [2] and it closes all of
>> it's multiplexers, which just send an error to their outbound observers
>> [3]. It seems to me that there should be a more graceful way to shut
>> everything down, but I'm not seeing it. Am I missing something?
>>
>> grpc-java suggests using GrpcCleanupRule to gracefully shut-down
>> in-process servers and clients [4], should we be utilizing that somehow?
>>
>> Brian
>>
>> [1] https://github.com/apache/beam/pull/7794
>> [2]
>> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java#L117
>> [3]
>> https://github.com/apache/beam/tree/master/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer.java#L112
>> [4]
>> https://github.com/grpc/grpc-java/blob/master/examples/README.md#unit-test-examples
>>
>> On Thu, Feb 7, 2019 at 11:49 AM Brian Hulette 
>> wrote:
>>
>>> This was already reported in BEAM-6512 [1], which Scott gave me as a
>>> starter bug. I haven't been able to reproduce locally, so I'm trying to see
>>> if I can get it to fail on Jenkins again with some additional logging [2].
>>>
>>> Definitely interested in other's thoughts on this, I only vaguely
>>> understand what's going on. So far the only headway I've made is noticing
>>> that the "CANCELLED: Multiplexer hanging up" error seems to always occur
>>> exactly three times in failing tests. Successful runs may have one or two
>>> of these messages but never three.
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-6512
>>> [2] https://github.com/apache/beam/pull/7767
>>>
>>> On Tue, Feb 5, 2019 at 9:50 AM Alex Amato  wrote:
>>>

 org.apache.beam.runners.fnexecution.data.GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients

 I keep seeing this test failing in my PRs

 https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/


 https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/testReport/junit/org.apache.beam.runners.fnexecution.data/GrpcDataServiceTest/testMessageReceivedBySingleClientWhenThereAreMultipleClients/


 I've seen this one come and go for a few weeks or so. I am unsure
 exactly when it first occured.

>>>


Re: Jira contributor permission request for Beam project

2019-02-10 Thread Kenneth Knowles
Done & welcome!

On Sun, Feb 10, 2019 at 3:59 PM Tony  wrote:

> Hi everyone,
>
> My name is Tony, I was using Beam at $previous_dayjob and I'd like to
> contribute back to the project.  I'm interested in helping expand IO
> capabilities and supporting the transition from the Source API to SDF
> starting with ElasticsearchIO.  Can someone grant me permissions to
> assign tickets I've created to myself?  I'd like to track my progress
> there.
>
> My ASF Jira username is: tmoulton
>
> Thanks,
> Tony Moulton
>


Re: JIRA priorities explaination

2019-02-10 Thread Daniel Oliveira
Are there existing meanings for the priorities in Jira already? I wasn't
able to find any info on either the Beam website or wiki about it, so I've
just been prioritizing issues based on gut feeling. If not, I think having
some well-defined priorities would be nice, at least for our test-failures,
and especially if we wanna have some SLOs like I've seen being thrown about.

On Fri, Feb 8, 2019 at 3:06 PM Kenneth Knowles  wrote:

> I've been thinking about this since working on the release. If I ignore
> the names I think:
>
> P0: get paged, stop whatever you planned on doing, work late to fix
> P1: continually update everyone on status and shouldn't sit around
> unassigned
> P2: most things here; they can be planned or picked up by whomever
> P3: nice-to-have things, maybe starter tasks or lesser cleanup, but no
> driving need
> Sometimes there's P4 but I don't value it. Often P3 is a deprioritized
> thing from P2, so more involved and complex, while P4 is something easy and
> not important filed just as a reminder. Either way, they are both not on
> the main path of work.
>
> I looked into it and the Jira priority scheme determines the set of
> priorities as well as the default. Ours is shared by 635 projects. Probably
> worth keeping. The default priority is Major which would correspond with
> P2. We can expect the default to be where most issues end up.
>
> P0 == Blocker: get paged, stop whatever you planned on doing, work late to
> fix
> P1 == Critical: continually update everyone on status and shouldn't sit
> around unassigned
> P0 == Major (default): most things here; they can be planned or picked up
> by whomever
> P3 == Minor: nice-to-have things, maybe starter tasks or lesser cleanup,
> but no driving need
> Trivial: Maybe this is attractive to newcomers as it makes it sound easy.
>
> Kenn
>
> On Thu, Feb 7, 2019 at 4:08 PM Alex Amato  wrote:
>
>> Hello Beam community, I was thinking about this and found some
>> information to share/discuss. Would it be possible to confirm my thinking
>> on this:
>>
>>- There are 5 priorities in the JIRA system today (tooltip link
>>
>> 
>>):
>>-
>>   - *Blocker* Blocks development and/or testing work, production
>>   could not run
>>   - *Critical* Crashes, loss of data, severe memory leak.
>>   - *Major* Major loss of function.
>>   - *Minor* Minor loss of function, or other problem where easy
>>   workaround is present.
>>   - *Trivial* Cosmetic problem like misspelt words or misaligned
>>   text.
>>- How should JIRA issues be prioritized for pre/post commit test
>>failures?
>>   - I think *Blocker*
>>- What about the flakey failures?
>>   - *Blocker* as well?
>>- How should non test issues be prioritized? (E.g. feature to
>>implement or bugs not regularly breaking tests).
>>   - I suggest *Minor*, but its not clear how to distinguish between
>>   these.
>>
>> Below is my thinking: But I wanted to know what the Apache/Beam community
>> generally thinks about these priorities.
>>
>>- *Blocker*: Expect to be paged. Production systems are down.
>>- *Critical*: Expect to be contacted by email or a bot to fix this.
>>- *Major*: Some loss of function in the repository, can issues that
>>need to be addressed soon are here.
>>- *Minor*: Most issues will be here, important issues within this
>>will get picked up and completed. FRs, bugs.
>>- *Trivial*: Unlikely to be implemented, far too many issues in this
>>category. FRs, bugs.
>>
>> Thanks for helping to clear this up
>> Alex
>>
>


Re: Schemas for classes with type parameters

2019-02-10 Thread Reuven Lax
This is an interesting question.

In general, I don't think schema inference can handle these generics today.
Right now the SchemaRegistry is keyed off of Java class, and due to type
erasure all different instances of . MyClass will look the same.

Now it might be possible to include generic type parameters in the
registry. You would not be able to use the @DefaultSchema annotation to
infer a schema, but you might be able to dynamically register a schema
using a TypeDescriptor. Unfortunately I think this would only sometimes
work. e..g. my experience has been that given a type T you can often figure
out T using reflection, but if there are nested types (e.g. List) than
Java doesn't always preserve these types for introspection..

In sum, I think we could do a bit better for these types of classes, but
not a whole lot better.

Reuven

On Mon, Feb 4, 2019 at 6:02 AM Jeff Klukas  wrote:

> I've started experimenting with Beam schemas in the context of creating
> custom AutoValue-based classes and using AutoValueSchema to generate
> schemas and thus coders.
>
> AFAICT, schemas need to have types fully specified, so it doesn't appear
> to be possible to define an AutoValue class with a type parameter and then
> create a schema for it. Basically, I want to confirm whether the following
> type would ever be possible to create a schema for:
>
> @DefaultSchema(AutoValueSchema.class)
> @AutoValue
> public abstract class MyClass {
>   public abstract T getField1();
>   public abstract String getField2();
>   public static  MyClass of(T field1, String field2) {
> return new AutoValue_MyClass(field1, field2);
>   }
> }
>
> This may be an entirely reasonable limitation of the schema machinery, but
> I want to make sure I'm not missing something.
>


Empty projects defined in settings.xml

2019-02-10 Thread Michael Luckey
Hi,

while looking into settings.gradle I stumbled upon 2 project definitions
[1],

- beam-runners-gcp-gcemd
- beam-runners-gcp-gcsproxy

which I can not make any sense of. Any insights, why these exists or what
they are supposed to be?

They were added with HadoopFormatIO.Write [2].

Thanks,

michel

[1] https://github.com/apache/beam/blob/master/settings.gradle#L64-L67
[2]
https://github.com/apache/beam/commit/757b71e749ab8a9f0a08e3669596ce69920acbac


Re: pipeline steps

2019-02-10 Thread Niel Markwick
This would have to flow through to the other IO wrappers as well, perhaps
outputting a KV

I recently wrote an AvroIO parseAllGenericRecord() equivalent transform,
because I was reading files of various schemas and needed the the parseFn
to know both the filename currently being read and use some side-input...

It ended up being quite complex - especially as I wanted to shard the file
read, like AvroIO already does - and I basically re-implemented part of
AvroIO for my use-case...

@Chaim, one simpler option could be to use parseGenericRecord and use the
*name* of the Avro schema in the GenericRecord as a way to determine the
table name - this may mean that you have to change the way your Avro files
are being written..




On Sun, 10 Feb 2019, 07:03 Reuven Lax,  wrote:

> I think we could definitely add an option to FileIO to add the filename to
> every record. It would come at a (performance) cost - often the filename is
> much larger than the actual record..
>
> On Thu, Feb 7, 2019 at 6:29 AM Kenneth Knowles  wrote:
>
>> This comes up a lot, wanting file names alongside the data that came from
>> the file. It is a historical quirk that none of our connectors used to have
>> the file names. What is the change needed for FileIO + parse Avro to be
>> really easy to use?
>>
>> Kenn
>>
>> On Thu, Feb 7, 2019 at 6:18 AM Jeff Klukas  wrote:
>>
>>> I haven't needed to do this with Beam before, but I've definitely had
>>> similar needs in the past. Spark, for example, provides an input_file_name
>>> function that can be applied to a dataframe to add the input file as an
>>> additional column. It's not clear to me how that's implemented, though.
>>>
>>> Perhaps others have suggestions, but I'm not aware of a way to do this
>>> conveniently in Beam today. To my knowledge, today you would have to use
>>> FileIO.match() and FileIO.readMatches() to get a collection of
>>> ReadableFile. You'd then have to FlatMapElements to pull out the metadata
>>> and the bytes of the file, and you'd be responsible for parsing those bytes
>>> into avro records. You'd  be able to output something like a KV
>>> that groups the file name together with the parsed avro record.
>>>
>>> Seems like something worth providing better support for in Beam itself
>>> if this indeed doesn't already exist.
>>>
>>> On Thu, Feb 7, 2019 at 7:29 AM Chaim Turkel  wrote:
>>>
 Hi,
   I am working on a pipeline that listens to a topic on pubsub to get
 files that have changes in the storage. Then i read avro files, and
 would like to write them to bigquery based on the file name (to
 different tables).
   My problem is that the transformer that reads the avro does not give
 me back the files name (like a tuple or something like that). I seem
 to have this pattern come back a lot.
 Can you think of any solutions?

 Chaim

 --


 Loans are funded by
 FinWise Bank, a Utah-chartered bank located in Sandy,
 Utah, member FDIC, Equal
 Opportunity Lender. Merchant Cash Advances are
 made by Behalf. For more
 information on ECOA, click here
 . For important information about
 opening a new
 account, review Patriot Act procedures here
 .
 Visit Legal
  to
 review our comprehensive program terms,
 conditions, and disclosures.

>>>