Re: More metadata in Coder Proto

2020-05-20 Thread Sam Rohde
+Robert Bradshaw  who is the reviewer on
https://github.com/apache/beam/pull/11503. How does that sound to you? Skip
the "is input deterministic" check for GBKs embedded in x-lang transforms?

On Wed, May 20, 2020 at 10:56 AM Sam Rohde  wrote:

> Thanks for your comments, here's a little more to the problem I'm working
> on: I have a PR to make GBK a primitive
>  and the aforementioned
> test_combine_globally was check failing in the run_pipeline method of the
> DataflowRunner.
> Specifically what is failing is when the DataflowRunner visits each
> transform, it checks if the GBK has a deterministic input coder. This fails
> when the GBK is expanded from the expansion service because the resulting
> ExternalCoder doesn't override the is_deterministic method.
>
> This wasn't being hit before because this deterministic input check only
> occurred during the apply_GroupByKey method. However, I moved it to when
> the DataflowRunner is creating a V1B3 pipeline during the run_pipeline
> stage.
>
>
> On Wed, May 20, 2020 at 10:13 AM Luke Cwik  wrote:
>
>> If the CombineGlobally is being returned by the expansion service, the
>> expansion service is on the hook for ensuring that intermediate
>> PCollections/PTransforms/... are constructed correctly.
>>
> Okay, this was kind of my hunch. If the DataflowRunner is making sure that
> the input coder to a GBK is deterministic, then we should skip the check if
> we receive an x-lang transform (seen in the Python SDK as a
> RunnerAPITransformHolder).
>
>
>>
>> I thought this question was about what to do if you want to take the
>> output of an XLang pipeline and process it through some generic transform
>> that doesn't care about the types and treats it like an opaque blob (like
>> the Count transform) and how to make that work when you don't know the
>> output properties. I don't think anyone has shared a design doc for this
>> problem that covered the different approaches.
>>
> Aside from the DataflowRunner GBK problem, I was also curious if there was
> any need for metadata around the Coder proto and why there currently is no
> metadata. If there was more metadata, like an is_deterministic field, then
> the GBK deterministic input check could also work.
>
>
>
>>
>> On Tue, May 19, 2020 at 9:47 PM Chamikara Jayalath 
>> wrote:
>>
>>> I think you are hitting GroupByKey [1] that is internal to the Java
>>> CombineGlobally implementation that takes a KV with a Void type (with
>>> VoidCoder) [2] as input.
>>>
>>> ExternalCoder was added to Python SDK to represent coders within
>>> external transforms that are not standard coders (in this case the
>>> VoidCoder). This is needed to perform the "pipeline proto -> Python object
>>> graph -> Dataflow job request" conversion.
>>>
>>> Seems like today, a runner is unable to perform this particular
>>> validation (and maybe others ?) for pipeline segments received through a
>>> cross-language transform expansion with or without the ExternalCoder. Note
>>> that a runner is not involved during cross-language transform expansion, so
>>> pipeline submission is the only location where a runner would get a chance
>>> to perform this kind of validation for cross-language transforms.
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/2967e3ae513a9bdb13c2da8ffa306fdc092370f0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L1596
>>> [2]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L1172
>>>
>>> On Tue, May 19, 2020 at 8:31 PM Luke Cwik  wrote:
>>>
 Since combine globally is a case where you don't need to know what the
 key or value is and could treat them as bytes allowing you to build and
 execute this pipeline (assuming you ignored properties such as
 is_deterministic).

 Regardless, I still think it makes sense to provide criteria on what
 your output shape must be during xlang pipeline expansion which is yet to
 be defined to support such a case. Your suggested solution of adding
 properties to coders is one possible solution but I think we have to take a
 step back and consider xlang as a whole since there are still several yet
 to be solved issues within it.


 On Tue, May 19, 2020 at 4:56 PM Sam Rohde  wrote:

> I have a PR that makes GBK a primitive in which the
> test_combine_globally
> 
> is failing on the DataflowRunner. In particular, the DataflowRunner runs
> over the transform in the run_pipeline method. I moved a method that
> verifies that coders as inputs to GBKs are deterministic during this
> run_pipeline. Previously, this was during the apply_GroupByKey.
>
> On Tue, May 19, 2020 at 4:48 PM Brian Hulette 
> wrote:
>
>> Yes I'm 

Re: More metadata in Coder Proto

2020-05-20 Thread Luke Cwik
On Wed, May 20, 2020 at 11:09 AM Sam Rohde  wrote:

> +Robert Bradshaw  who is the reviewer on
> https://github.com/apache/beam/pull/11503. How does that sound to you?
> Skip the "is input deterministic" check for GBKs embedded in x-lang
> transforms?
>
> On Wed, May 20, 2020 at 10:56 AM Sam Rohde  wrote:
>
>> Thanks for your comments, here's a little more to the problem I'm working
>> on: I have a PR to make GBK a primitive
>>  and the aforementioned
>> test_combine_globally was check failing in the run_pipeline method of the
>> DataflowRunner.
>> Specifically what is failing is when the DataflowRunner visits each
>> transform, it checks if the GBK has a deterministic input coder. This fails
>> when the GBK is expanded from the expansion service because the resulting
>> ExternalCoder doesn't override the is_deterministic method.
>>
>> This wasn't being hit before because this deterministic input check only
>> occurred during the apply_GroupByKey method. However, I moved it to when
>> the DataflowRunner is creating a V1B3 pipeline during the run_pipeline
>> stage.
>>
>>
>> On Wed, May 20, 2020 at 10:13 AM Luke Cwik  wrote:
>>
>>> If the CombineGlobally is being returned by the expansion service, the
>>> expansion service is on the hook for ensuring that intermediate
>>> PCollections/PTransforms/... are constructed correctly.
>>>
>> Okay, this was kind of my hunch. If the DataflowRunner is making sure
>> that the input coder to a GBK is deterministic, then we should skip the
>> check if we receive an x-lang transform (seen in the Python SDK as a
>> RunnerAPITransformHolder).
>>
>>
>>>
>>> I thought this question was about what to do if you want to take the
>>> output of an XLang pipeline and process it through some generic transform
>>> that doesn't care about the types and treats it like an opaque blob (like
>>> the Count transform) and how to make that work when you don't know the
>>> output properties. I don't think anyone has shared a design doc for this
>>> problem that covered the different approaches.
>>>
>> Aside from the DataflowRunner GBK problem, I was also curious if there
>> was any need for metadata around the Coder proto and why there currently is
>> no metadata. If there was more metadata, like an is_deterministic field,
>> then the GBK deterministic input check could also work.
>>
>>
It doesn't exist because there was no reason for those properties to be
exposed since it was all at pipeline construction time and all these
details could be held within the SDK. Once the pipeline is converted to
proto, the contract for using the beam:transform:group_by_key:v1 transform
is that the key encoding is deterministic and it was upto SDKs to perform
this validation. Since pipeline construction has now spilled over to
include transmitting parts of the pipeline in proto form because of how
XLang expansion works, it might be necessary to expose more of these
properties but this is yet to be designed.


>
>>
>>>
>>> On Tue, May 19, 2020 at 9:47 PM Chamikara Jayalath 
>>> wrote:
>>>
 I think you are hitting GroupByKey [1] that is internal to the Java
 CombineGlobally implementation that takes a KV with a Void type (with
 VoidCoder) [2] as input.

 ExternalCoder was added to Python SDK to represent coders within
 external transforms that are not standard coders (in this case the
 VoidCoder). This is needed to perform the "pipeline proto -> Python object
 graph -> Dataflow job request" conversion.

 Seems like today, a runner is unable to perform this particular
 validation (and maybe others ?) for pipeline segments received through a
 cross-language transform expansion with or without the ExternalCoder. Note
 that a runner is not involved during cross-language transform expansion, so
 pipeline submission is the only location where a runner would get a chance
 to perform this kind of validation for cross-language transforms.

 [1]
 https://github.com/apache/beam/blob/2967e3ae513a9bdb13c2da8ffa306fdc092370f0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L1596
 [2]
 https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L1172

 On Tue, May 19, 2020 at 8:31 PM Luke Cwik  wrote:

> Since combine globally is a case where you don't need to know what the
> key or value is and could treat them as bytes allowing you to build and
> execute this pipeline (assuming you ignored properties such as
> is_deterministic).
>
> Regardless, I still think it makes sense to provide criteria on what
> your output shape must be during xlang pipeline expansion which is yet to
> be defined to support such a case. Your suggested solution of adding
> properties to coders is one possible solution but I think we have to take 
> a
> step back and consider xlang as a whole 

Re: [ANNOUNCE] New committer: Robin Qiu

2020-05-20 Thread Austin Bennett
Congrats!

On Tue, May 19, 2020, 8:32 PM Chamikara Jayalath 
wrote:

> Congrats Robin!
>
> On Tue, May 19, 2020 at 2:39 PM Rui Wang  wrote:
>
>> Nice! Congrats!
>>
>>
>>
>> -Rui
>>
>> On Tue, May 19, 2020 at 11:13 AM Pablo Estrada 
>> wrote:
>>
>>> yoohoo : )
>>>
>>> On Tue, May 19, 2020 at 11:03 AM Yifan Zou  wrote:
>>>
 Congratulations, Robin!

 On Tue, May 19, 2020 at 10:53 AM Udi Meiri  wrote:

> Congratulations Robin!
>
> On Tue, May 19, 2020, 10:15 Valentyn Tymofieiev 
> wrote:
>
>> Congratulations, Robin!
>>
>> On Tue, May 19, 2020 at 9:10 AM Yichi Zhang 
>> wrote:
>>
>>> Congrats Robin!
>>>
>>> On Tue, May 19, 2020 at 8:56 AM Kamil Wasilewski <
>>> kamil.wasilew...@polidea.com> wrote:
>>>
 Congrats!

 On Tue, May 19, 2020 at 5:33 PM Jan Lukavský 
 wrote:

> Congrats Robin!
> On 5/19/20 5:01 PM, Tyson Hamilton wrote:
>
> Congratulations!
>
> On Tue, May 19, 2020 at 6:10 AM Omar Ismail 
> wrote:
>
>> Congrats!
>>
>> On Tue, May 19, 2020 at 5:00 AM Gleb Kanterov 
>> wrote:
>>
>>> Congratulations!
>>>
>>> On Tue, May 19, 2020 at 7:31 AM Aizhamal Nurmamat kyzy <
>>> aizha...@apache.org> wrote:
>>>
 Congratulations, Robin! Thank you for your contributions!

 On Mon, May 18, 2020, 7:18 PM Boyuan Zhang 
 wrote:

> Congrats~~
>
> On Mon, May 18, 2020 at 7:17 PM Reza Rokni 
> wrote:
>
>> Congratulations!
>>
>> On Tue, May 19, 2020 at 10:06 AM Ahmet Altay <
>> al...@google.com> wrote:
>>
>>> Hi everyone,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a
>>> new committer: Robin Qiu .
>>>
>>> Robin has been active in the community for close to 2 years,
>>> worked on HyperLogLog++ [1], SQL [2], improved documentation, 
>>> and helped
>>> with releases(*).
>>>
>>> In consideration of his contributions, the Beam PMC trusts
>>> him with the responsibilities of a Beam committer [3].
>>>
>>> Thank you for your contributions Robin!
>>>
>>> -Ahmet, on behalf of the Apache Beam PMC
>>>
>>> [1]
>>> https://www.meetup.com/Zurich-Apache-Beam-Meetup/events/265529665/
>>> [2]
>>> https://www.meetup.com/Belgium-Apache-Beam-Meetup/events/264933301/
>>> [3] https://beam.apache.org/contribute/become-a-committer
>>> /#an-apache-beam-committer
>>> (*) And maybe he will be a release manager soon :)
>>>
>>> --
>>
>> Omar Ismail |  Technical Solutions Engineer |
>> omarism...@google.com |
>>
>


Re: Event Calendar?

2020-05-20 Thread Tyson Hamilton
+1 a calendar would be nice.

On Tue, May 19, 2020 at 3:51 PM Austin Bennett 
wrote:

> Hi All,
>
> As we have events more often that are more accessible (digital), wondering
> whether others see a value of adding a calendar to the website?
>
> Perhaps related, is it worth updating
> https://beam.apache.org/community/in-person/ <- to something that isn't
> 'in-person' since doing things in-person is perhaps (hopefully not
> completely) a vestige of the past.
>
> Cheers,
> Austin
>


Re: More metadata in Coder Proto

2020-05-20 Thread Luke Cwik
If the CombineGlobally is being returned by the expansion service, the
expansion service is on the hook for ensuring that intermediate
PCollections/PTransforms/... are constructed correctly.

I thought this question was about what to do if you want to take the output
of an XLang pipeline and process it through some generic transform that
doesn't care about the types and treats it like an opaque blob (like the
Count transform) and how to make that work when you don't know the output
properties. I don't think anyone has shared a design doc for this problem
that covered the different approaches.

On Tue, May 19, 2020 at 9:47 PM Chamikara Jayalath 
wrote:

> I think you are hitting GroupByKey [1] that is internal to the Java
> CombineGlobally implementation that takes a KV with a Void type (with
> VoidCoder) [2] as input.
>
> ExternalCoder was added to Python SDK to represent coders within external
> transforms that are not standard coders (in this case the VoidCoder). This
> is needed to perform the "pipeline proto -> Python object graph -> Dataflow
> job request" conversion.
>
> Seems like today, a runner is unable to perform this particular validation
> (and maybe others ?) for pipeline segments received through a
> cross-language transform expansion with or without the ExternalCoder. Note
> that a runner is not involved during cross-language transform expansion, so
> pipeline submission is the only location where a runner would get a chance
> to perform this kind of validation for cross-language transforms.
>
> [1]
> https://github.com/apache/beam/blob/2967e3ae513a9bdb13c2da8ffa306fdc092370f0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L1596
> [2]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L1172
>
> On Tue, May 19, 2020 at 8:31 PM Luke Cwik  wrote:
>
>> Since combine globally is a case where you don't need to know what the
>> key or value is and could treat them as bytes allowing you to build and
>> execute this pipeline (assuming you ignored properties such as
>> is_deterministic).
>>
>> Regardless, I still think it makes sense to provide criteria on what your
>> output shape must be during xlang pipeline expansion which is yet to be
>> defined to support such a case. Your suggested solution of adding
>> properties to coders is one possible solution but I think we have to take a
>> step back and consider xlang as a whole since there are still several yet
>> to be solved issues within it.
>>
>>
>> On Tue, May 19, 2020 at 4:56 PM Sam Rohde  wrote:
>>
>>> I have a PR that makes GBK a primitive in which the
>>> test_combine_globally
>>> 
>>> is failing on the DataflowRunner. In particular, the DataflowRunner runs
>>> over the transform in the run_pipeline method. I moved a method that
>>> verifies that coders as inputs to GBKs are deterministic during this
>>> run_pipeline. Previously, this was during the apply_GroupByKey.
>>>
>>> On Tue, May 19, 2020 at 4:48 PM Brian Hulette 
>>> wrote:
>>>
 Yes I'm unclear on how a PCollection with ExternalCoder made it into a
 downstream transform that enforces is_deterministic. My understanding of
 ExternalCoder (admittedly just based on a quick look at commit history) is
 that it's a shim added so the Python SDK can handle coders that are
 internal to cross-language transforms.
 I think that if the Python SDK is trying to introspect an ExternalCoder
 instance then something is wrong.

 Brian

 On Tue, May 19, 2020 at 4:01 PM Luke Cwik  wrote:

> I see. The problem is that you are trying to know certain properties
> of the coder to use in a downstream transform which enforces that it is
> deterministic like GroupByKey.
>
> In all the scenarios so far that I have seen we have required both
> SDKs to understand the coder, how are you having a cross language pipeline
> where the downstream SDK doesn't understand the coder and works?
>
> Also, an alternative strategy would be to tell the expansion service
> that you need to choose a coder that is deterministic on the output. This
> would require building the pipeline and before submission to the job 
> server
> perform the expansion telling it all the limitations that the SDK has
> imposed on it.
>
>
>
>
> On Tue, May 19, 2020 at 3:45 PM Sam Rohde  wrote:
>
>> Hi all,
>>
>> Should there be more metadata in the Coder Proto? For example, adding
>> an "is_deterministic" boolean field. This will allow for a
>> language-agnostic way for SDKs to infer properties about a coder received
>> from the expansion service.
>>
>> My motivation for this is that I recently ran into a problem in which
>> an "ExternalCoder" in the 

Re: More metadata in Coder Proto

2020-05-20 Thread Sam Rohde
Thanks for your comments, here's a little more to the problem I'm working
on: I have a PR to make GBK a primitive
 and the aforementioned
test_combine_globally was check failing in the run_pipeline method of the
DataflowRunner.
Specifically what is failing is when the DataflowRunner visits each
transform, it checks if the GBK has a deterministic input coder. This fails
when the GBK is expanded from the expansion service because the resulting
ExternalCoder doesn't override the is_deterministic method.

This wasn't being hit before because this deterministic input check only
occurred during the apply_GroupByKey method. However, I moved it to when
the DataflowRunner is creating a V1B3 pipeline during the run_pipeline
stage.


On Wed, May 20, 2020 at 10:13 AM Luke Cwik  wrote:

> If the CombineGlobally is being returned by the expansion service, the
> expansion service is on the hook for ensuring that intermediate
> PCollections/PTransforms/... are constructed correctly.
>
Okay, this was kind of my hunch. If the DataflowRunner is making sure that
the input coder to a GBK is deterministic, then we should skip the check if
we receive an x-lang transform (seen in the Python SDK as a
RunnerAPITransformHolder).


>
> I thought this question was about what to do if you want to take the
> output of an XLang pipeline and process it through some generic transform
> that doesn't care about the types and treats it like an opaque blob (like
> the Count transform) and how to make that work when you don't know the
> output properties. I don't think anyone has shared a design doc for this
> problem that covered the different approaches.
>
Aside from the DataflowRunner GBK problem, I was also curious if there was
any need for metadata around the Coder proto and why there currently is no
metadata. If there was more metadata, like an is_deterministic field, then
the GBK deterministic input check could also work.



>
> On Tue, May 19, 2020 at 9:47 PM Chamikara Jayalath 
> wrote:
>
>> I think you are hitting GroupByKey [1] that is internal to the Java
>> CombineGlobally implementation that takes a KV with a Void type (with
>> VoidCoder) [2] as input.
>>
>> ExternalCoder was added to Python SDK to represent coders within external
>> transforms that are not standard coders (in this case the VoidCoder). This
>> is needed to perform the "pipeline proto -> Python object graph -> Dataflow
>> job request" conversion.
>>
>> Seems like today, a runner is unable to perform this particular
>> validation (and maybe others ?) for pipeline segments received through a
>> cross-language transform expansion with or without the ExternalCoder. Note
>> that a runner is not involved during cross-language transform expansion, so
>> pipeline submission is the only location where a runner would get a chance
>> to perform this kind of validation for cross-language transforms.
>>
>> [1]
>> https://github.com/apache/beam/blob/2967e3ae513a9bdb13c2da8ffa306fdc092370f0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L1596
>> [2]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L1172
>>
>> On Tue, May 19, 2020 at 8:31 PM Luke Cwik  wrote:
>>
>>> Since combine globally is a case where you don't need to know what the
>>> key or value is and could treat them as bytes allowing you to build and
>>> execute this pipeline (assuming you ignored properties such as
>>> is_deterministic).
>>>
>>> Regardless, I still think it makes sense to provide criteria on what
>>> your output shape must be during xlang pipeline expansion which is yet to
>>> be defined to support such a case. Your suggested solution of adding
>>> properties to coders is one possible solution but I think we have to take a
>>> step back and consider xlang as a whole since there are still several yet
>>> to be solved issues within it.
>>>
>>>
>>> On Tue, May 19, 2020 at 4:56 PM Sam Rohde  wrote:
>>>
 I have a PR that makes GBK a primitive in which the
 test_combine_globally
 
 is failing on the DataflowRunner. In particular, the DataflowRunner runs
 over the transform in the run_pipeline method. I moved a method that
 verifies that coders as inputs to GBKs are deterministic during this
 run_pipeline. Previously, this was during the apply_GroupByKey.

 On Tue, May 19, 2020 at 4:48 PM Brian Hulette 
 wrote:

> Yes I'm unclear on how a PCollection with ExternalCoder made it into a
> downstream transform that enforces is_deterministic. My understanding of
> ExternalCoder (admittedly just based on a quick look at commit history) is
> that it's a shim added so the Python SDK can handle coders that are
> internal to cross-language transforms.
> I think that if 

Re: More metadata in Coder Proto

2020-05-20 Thread Robert Bradshaw
On Wed, May 20, 2020 at 11:09 AM Sam Rohde  wrote:

> +Robert Bradshaw  who is the reviewer on
> https://github.com/apache/beam/pull/11503. How does that sound to you?
> Skip the "is input deterministic" check for GBKs embedded in x-lang
> transforms?
>

Yes, I think this is the right situation in this case. Longer-term, we may
want to handle cases like

[java produces KVs]
[python performs GBK]
[java consumes GBK results]

where properties like this may need to be exposed, but this may also be
ruled out by rejecting "unknown" coders at the boundaries (rather than ones
that are entirely internal).


> On Wed, May 20, 2020 at 10:56 AM Sam Rohde  wrote:
>
>> Thanks for your comments, here's a little more to the problem I'm working
>> on: I have a PR to make GBK a primitive
>>  and the aforementioned
>> test_combine_globally was check failing in the run_pipeline method of the
>> DataflowRunner.
>> Specifically what is failing is when the DataflowRunner visits each
>> transform, it checks if the GBK has a deterministic input coder. This fails
>> when the GBK is expanded from the expansion service because the resulting
>> ExternalCoder doesn't override the is_deterministic method.
>>
>> This wasn't being hit before because this deterministic input check only
>> occurred during the apply_GroupByKey method. However, I moved it to when
>> the DataflowRunner is creating a V1B3 pipeline during the run_pipeline
>> stage.
>>
>>
>> On Wed, May 20, 2020 at 10:13 AM Luke Cwik  wrote:
>>
>>> If the CombineGlobally is being returned by the expansion service, the
>>> expansion service is on the hook for ensuring that intermediate
>>> PCollections/PTransforms/... are constructed correctly.
>>>
>> Okay, this was kind of my hunch. If the DataflowRunner is making sure
>> that the input coder to a GBK is deterministic, then we should skip the
>> check if we receive an x-lang transform (seen in the Python SDK as a
>> RunnerAPITransformHolder).
>>
>>
>>>
>>> I thought this question was about what to do if you want to take the
>>> output of an XLang pipeline and process it through some generic transform
>>> that doesn't care about the types and treats it like an opaque blob (like
>>> the Count transform) and how to make that work when you don't know the
>>> output properties. I don't think anyone has shared a design doc for this
>>> problem that covered the different approaches.
>>>
>> Aside from the DataflowRunner GBK problem, I was also curious if there
>> was any need for metadata around the Coder proto and why there currently is
>> no metadata. If there was more metadata, like an is_deterministic field,
>> then the GBK deterministic input check could also work.
>>
>>
>>
>>>
>>> On Tue, May 19, 2020 at 9:47 PM Chamikara Jayalath 
>>> wrote:
>>>
 I think you are hitting GroupByKey [1] that is internal to the Java
 CombineGlobally implementation that takes a KV with a Void type (with
 VoidCoder) [2] as input.

 ExternalCoder was added to Python SDK to represent coders within
 external transforms that are not standard coders (in this case the
 VoidCoder). This is needed to perform the "pipeline proto -> Python object
 graph -> Dataflow job request" conversion.

 Seems like today, a runner is unable to perform this particular
 validation (and maybe others ?) for pipeline segments received through a
 cross-language transform expansion with or without the ExternalCoder. Note
 that a runner is not involved during cross-language transform expansion, so
 pipeline submission is the only location where a runner would get a chance
 to perform this kind of validation for cross-language transforms.

 [1]
 https://github.com/apache/beam/blob/2967e3ae513a9bdb13c2da8ffa306fdc092370f0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L1596
 [2]
 https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java#L1172

 On Tue, May 19, 2020 at 8:31 PM Luke Cwik  wrote:

> Since combine globally is a case where you don't need to know what the
> key or value is and could treat them as bytes allowing you to build and
> execute this pipeline (assuming you ignored properties such as
> is_deterministic).
>
> Regardless, I still think it makes sense to provide criteria on what
> your output shape must be during xlang pipeline expansion which is yet to
> be defined to support such a case. Your suggested solution of adding
> properties to coders is one possible solution but I think we have to take 
> a
> step back and consider xlang as a whole since there are still several yet
> to be solved issues within it.
>
>
> On Tue, May 19, 2020 at 4:56 PM Sam Rohde  wrote:
>
>> I have a PR that makes GBK a primitive in which the
>> test_combine_globally
>> 

Code Coverage Tracking

2020-05-20 Thread Tyson Hamilton
Hello,

I noticed on the github page, there is a badge that reports 100% coverage. This 
seems suspect, and sure enough after a couple clicks coverall shows that only 
one file is tracked. A more interesting page is the Builds page [1] that shows 
the impact of specific PRs.

It would be really nice if there was a way to get a coverage breakdown for Beam 
by directory, even nicer if that could be displayed like the post-commit test 
coverage table (though that may be a bit much). I'd also love to see these 
build coverage feel metrics in the actual PRs if possible.

I'm not familiar with coverall, didn't find any information on cwiki regarding 
the configuration for Beam, and was wondering if anyone has information?


[1]: https://coveralls.io/repos/140391/builds


Re: Transparency to Beam Digital Summit Planning

2020-05-20 Thread Austin Bennett
Should the link/meeting notes be publicly available?  Not just available to
individuals plus all of @google?



On Wed, May 20, 2020 at 2:06 PM Brittany Hermann 
wrote:

> Hi folks,
>
> I wanted to provide a few different ways of transparency to you during the
> planning of the Beam Digital Summit.
>
> 1) *Beam Summit Status Reports:* I will be sending out weekly Beam Summit
> Status Reports which will include the goals, attendees, topics discussed,
> and decisions made every Wednesday.
>
> 2) *Community Guests on Committee Planning Calls:* We would like to
> invite you to join as a guest to these planning calls. This would allow
> for observation of the planning process and to see if there are ways for
> future collaboration on promotions, etc. for the event. If you are
> interested in joining the first bi-weekly meeting starting next week,
> please reach out to me and I will send the invite with call-in information
> directly to you.
>
> In the meantime, I have attached this week's Beam Summit Status report
> below.
>
>
> https://docs.google.com/document/d/1_jLhKvW5MTtkHOZDJyzCTSLUDiD4RjlJmU35rXV-3n0/edit?usp=sharing
>
> Have a great rest of your week!
>
> --
>
> Brittany Hermann
>
> Open Source Program Manager (Provided by Adecco Staffing)
>
> 1190 Bordeaux Drive , Building 4, Sunnyvale, CA 94089
> 
>
>
>


Re: [DISCUSS] Dealing with @Ignored tests

2020-05-20 Thread Kyle Weaver
> I think that after 2-3 years being ignored the tests might have already
lost their relevance.

This is true of all jira issues, and I think consensus on another thread
was that we don't want to auto-close issues.
https://lists.apache.org/thread.html/rb51dfffbc8caf40efe7e1d137402438a05d0375fd945bda8fd7e33d2%40%3Cdev.beam.apache.org%3E

I think the best suggestion from that thread was that we do "spring
cleaning" on open JIRA issues. I know some contributors are doing that
already, but it'd be great if we could coordinate a wider-scale effort as
opposed to just a handful of dedicated contributors trying to do everything.

On Mon, May 18, 2020 at 4:21 AM Jan Lukavský  wrote:

> Hi,
>
> +1 for creating a checkstyle validation that we have associated JIRA with
> each Ignored test. But it seems to me we might need something more, because
> some of the associated JIRAs are open for years. I think that after 2-3
> years being ignored the tests might have already lost their relevance.
>
> Jan
> On 5/15/20 10:20 PM, Luke Cwik wrote:
>
> For the ones without the label, someone would need to use blame and track
> back to why it was sickbayed.
>
> On Fri, May 15, 2020 at 1:08 PM Kenneth Knowles  wrote:
>
>> There are 101 instances of @Ignore, and I've listed them below. A few
>> takeaways:
>>
>>  - highly concentrated in ZetaSQL, and then second tier in various state
>> tests specific to a runner
>>  - there are not that many overall, so I'm not sure a report will add much
>>  - they do not all have Jiras
>>  - they do not even all have any explanation at all (some don't leave out
>> the string parameter, but have an empty string!)
>>
>> Having a checkstyle that there is a Jira attached seems nice. Then we
>> could easily grep out the Jiras and not depend on the "sickbay" label.
>>
>> Raw data (to see the individual items, just do the grep and not the
>> processing)
>>
>>   % grep --recursive --exclude-dir build '@Ignore' . | cut -d ' ' -f 1 |
>> sort | uniq -c | sort -r
>>   27
>> ./sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java:
>>   11
>> ./runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/FlinkBroadcastStateInternalsTest.java:
>>7
>> ./runners/spark/src/test/java/org/apache/beam/runners/spark/stateful/SparkStateInternalsTest.java:
>>7
>> ./runners/apex/src/test/java/org/apache/beam/runners/apex/translation/utils/ApexStateInternalsTest.java:
>>4
>> ./sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/QueryTest.java:
>>4
>> ./runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/StructuredStreamingPipelineStateTest.java:
>>2
>> ./sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/XmlSourceTest.java:
>>2
>> ./sdks/java/io/mqtt/src/test/java/org/apache/beam/sdk/io/mqtt/MqttIOTest.java:
>>2
>> ./sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubJsonIT.java:
>>2
>> ./sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/BeamSqlLineIT.java:
>>2
>> ./sdks/java/extensions/euphoria/src/test/java/org/apache/beam/sdk/extensions/euphoria/core/testkit/ReduceByKeyTest.java:
>>2
>> ./sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java:
>>2
>> ./sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java:
>>2
>> ./runners/direct-java/src/test/java/org/apache/beam/runners/direct/ExecutorServiceParallelExecutorTest.java:
>>1
>> ./sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/sources/UnboundedEventSourceTest.java:
>>1
>> ./sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/sql/SqlQuery5Test.java:
>>1
>> ./sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOTest.java:
>>1
>> ./sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOTest.java:
>>1
>> ./sdks/java/io/hadoop-file-system/src/test/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemTest.java:
>>1
>> ./sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIOTest.java:
>>1
>> ./sdks/java/io/amazon-web-services2/src/test/java/org/apache/beam/sdk/io/aws2/dynamodb/DynamoDBIOTest.java:@Ignore
>> ("[BEAM-7794]
>>1
>> ./sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIOTest.java:@Ignore
>> ("[BEAM-7794]
>>1
>> ./sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaCSVTableIT.java:@Ignore
>> ("https://issues.apache.org/jira/projects/BEAM/issues/BEAM-7523;)
>>1
>> ./sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/JdbcDriverTest.java:
>>1
>> ./sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlExplainTest.java:
>>1
>> 

Re: Discussion on Project Idea for Session Of docs 2020

2020-05-20 Thread Kyle Weaver
Hi Divya,


Thank you for the introduction and your interest to work on Apache Beam
documentation with Season of Docs. To participate in the program you need
to follow the guides here [1] [2]. If you are new to the program, we
suggest:

   1.

   Start by studying our proposed project ideas and expected deliverables
   for each of them [3].
   2.

   Explore more in depth the existing related Beam documentation for each
   project idea. We provided links to the background material, known issues
   and current documentation for both project ideas [4] [5]. Choose one
   project you like the most.
   3.

   Start drafting a proposal with the gaps you have found and ideas for
   improvement, and how you would present the new/updated/full documentation.
   Here are more tips on how to make your proposal stronger [6]. Please,
   follow the guides and make sure you cover all points.
   4.

   Submit the project proposal to the Google program administrators during
   the technical writer application phase. It opens on June 9, 2020. If you
   want any  feedback for your initial draft, consider using Google Docs
   and share on dev@beam.apache.org, so the community members can leave
   their comments and suggestions (please check the access to the doc before
   sending to the mailing list).

If you have any ideas that you want to brainstorm about, don’t hesitate to
start a discussion in the community list or reach out on Slack channel to
discuss issues related to GSoD documentation [7]. Once you create an
account join #beam-gsod channel.

Project administrators will assess proposals based on these guidelines [8]

Hope it helps. Let us know if you have more questions.

Thanks,

Beam GSoD team

[1] https://developers.google.com/season-of-docs/docs/tech-writer-guide

[2] https://developers.google.com/season-of-docs/terms/tech-writer-terms

[3] https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs

[4] https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs
#GoogleSeasonofDocs-1.DeploymentofaFlinkandSparkClusterswithPortableBeam

[5] https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs
#GoogleSeasonofDocs-2.Updateoftherunnercomparisonpage/capabilitymatrix

[6] https://developers.google.com/season-of-docs/docs
/tech-writer-application-hints

[7] https://join.slack.com/share/zt-eaml657m-KMamnNZfRF2BB7eQpmvveg

[8] https://developers.google.com/season-of-docs/docs
/project-selection#assess-proposal


On Wed, May 20, 2020 at 5:01 AM Divya Sanghi 
wrote:

> Follow up ,can anyone please reply
>
> On Tue, 19 May 2020, 18:02 Divya Sanghi, 
> wrote:
>
>> Hello Aizhamal,
>>
>> I am working on Big Data technologies and has hands-on experience on
>> Flink, Spark, Kafka
>> and also did POC where I created Docker image of Fink job and ran it on
>> K8S cluster on the local machine.
>>
>> Attaching my POC project: https://github.com/sanghisha145/flink_on_k8s
>>
>> I really find this project " Deployment of a Flink and Spark Clusters
>> with Portable Beam " interesting and feel that I can contribute
>> whole-heartedly to documentation on it.
>>
>> Let me know where I can start?
>>
>> PS:  I have not written any open documentation but has written many for
>> my organization(created many technical articles on confluence page)
>>
>> Thanks
>> Divya
>>
>


Transparency to Beam Digital Summit Planning

2020-05-20 Thread Brittany Hermann
Hi folks,

I wanted to provide a few different ways of transparency to you during the
planning of the Beam Digital Summit.

1) *Beam Summit Status Reports:* I will be sending out weekly Beam Summit
Status Reports which will include the goals, attendees, topics discussed,
and decisions made every Wednesday.

2) *Community Guests on Committee Planning Calls:* We would like to invite
you to join as a guest to these planning calls. This would allow
for observation of the planning process and to see if there are ways for
future collaboration on promotions, etc. for the event. If you are
interested in joining the first bi-weekly meeting starting next week,
please reach out to me and I will send the invite with call-in information
directly to you.

In the meantime, I have attached this week's Beam Summit Status report
below.

https://docs.google.com/document/d/1_jLhKvW5MTtkHOZDJyzCTSLUDiD4RjlJmU35rXV-3n0/edit?usp=sharing

Have a great rest of your week!

-- 

Brittany Hermann

Open Source Program Manager (Provided by Adecco Staffing)

1190 Bordeaux Drive , Building 4, Sunnyvale, CA 94089



Re: [VOTE] Release 2.21.0, release candidate #1

2020-05-20 Thread Robert Bradshaw
-1, the wheel files seem to be built against the wrong commit. E.g.

unzip -p
apache_beam-2.21.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
apache_beam/runners/worker/bundle_processor.py | head -n 40

notice the missing "import bisect" (among other things) missing from
https://github.com/apache/beam/blob/release-2.21.0/sdks/python/apache_beam/runners/worker/bundle_processor.py
.

(I do agree that BEAM-9887 isn't severe enough to hold up the release at
this point.)


On Tue, May 19, 2020 at 8:48 PM rahul patwari 
wrote:

> Hi Luke,
>
> The release is not severely broken without PR #11609.
> The PR ensures that, while building a Row with Logical Type, the input
> value provided is proper. If we take FixedBytes logical type with length
> 10, for example, the proper input value will be a byte array of length 10.
> But, without this PR, for FixedBytes logical type, the Row will be built
> with input value with length less than the expected length.
> But, as long as the input value provided is correct, there shouldn't be
> any problems.
> I will change the fix version as 2.22.0 for BEAM-9887
> .
>
> Regards,
> Rahul
>
> On Wed, May 20, 2020 at 8:51 AM Luke Cwik  wrote:
>
>> Rahul, do you believe that the release is severely broken without
>> PR/11609 enough to require another release candidate or would waiting till
>> 2.22 (which is due to be cut tomorrow)?
>>
>> On Tue, May 19, 2020 at 8:13 PM rahul patwari 
>> wrote:
>>
>>> Hi,
>>>
>>> Can the PR: https://github.com/apache/beam/pull/11609 be cherry-picked
>>> for 2.21.0 release?
>>> If not, the fix version has to be changed for BEAM-9887
>>> .
>>>
>>> Regards,
>>> Rahul
>>>
>>> On Wed, May 20, 2020 at 6:05 AM Ahmet Altay  wrote:
>>>
 +1, I validated python 2 and 3 quickstarts.

 On Tue, May 19, 2020 at 4:57 PM Hannah Jiang 
 wrote:

> I confirmed that licenses/notices/source code are added to Java and
> Python docker images as expected.
>
>
> On Tue, May 19, 2020 at 2:36 PM Kyle Weaver 
> wrote:
>
>> Thanks for bringing that up Steve. I'll leave it to others to vote on
>> whether that necessitates an RC #2.
>>
>> On Tue, May 19, 2020 at 5:22 PM Steve Niemitz 
>> wrote:
>>
>>> https://issues.apache.org/jira/browse/BEAM-10015 was marked as 2.21
>>> but isn't in the RC1 tag.  It's marked as P1, and seems like the
>>> implication is that without the fix, pipelines can produce incorrect 
>>> data.
>>> Is this a blocker?
>>>
>>
 +Reuven Lax , would this be a release blocker?


>
>>> On Tue, May 19, 2020 at 4:51 PM Kyle Weaver 
>>> wrote:
>>>
 Hi everyone,
 Please review and vote on the release candidate #1 for the version
 2.21.0, as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific
 comments)


 The complete staging area is available for your review, which
 includes:
 * JIRA release notes [1],
 * the official Apache source release to be deployed to
 dist.apache.org [2], which is signed with the key with fingerprint
 F11E37D7F006D086232876797B6D6673C79AEA72 [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.21.0-RC1" [5],
 * website pull request listing the release [6], publishing the API
 reference manual [7], and the blog post [8].
 * Java artifacts were built with Maven 3.6.3 and OpenJDK/Oracle JDK
 1.8.0.
 * Python artifacts are deployed along with the source release to
 the dist.apache.org [2].
 * Validation sheet with a tab for 2.21.0 release to help with
 validation [9].
 * Docker images published to Docker Hub [10].

 The vote will be open for at least 72 hours. It is adopted by
 majority approval, with at least 3 PMC affirmative votes.

 Thanks,
 Kyle

 [1]
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12347143
 [2] https://dist.apache.org/repos/dist/dev/beam/2.21.0/
 [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 [4]
 https://repository.apache.org/content/repositories/orgapachebeam-1103/
 [5] https://github.com/apache/beam/releases/tag/v2.21.0-RC1
 [6] https://github.com/apache/beam/pull/11727
 [7] https://github.com/apache/beam-site/pull/603
 [8] https://github.com/apache/beam/pull/11729
 [9]
 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=275707202
 [10] https://hub.docker.com/search?q=apache%2Fbeam=image

>>>


GSoD participation

2020-05-20 Thread Chandan Prakash
Hello community,
I am Chandan Prakash , sophomore year student in CS from India. I really
want to contribute to the documentation . I have gone through the project
ideas . I am new to Apache beam as I have never used it before.
Right now, I am exploring Apache beam , runner , SDKs ,etc. But I am quite
familiar with open source .
I also want to join slack channel for GSoD idea discussion . But , I don't
know the workspace URL.
Suggestions for where should I start are highly appreciated.


Thanks


2.22.0 Release Update

2020-05-20 Thread Brian Hulette
Hi everyone,

It's time for dueling release branches! The 2.22.0 branch has been cut [1].
- If you notice any release blockers [2] please tag a jira with fix version
2.22.0 and cc me (bhulette).
- Please update the change log [3] with any significant changes if you
haven't yet. Put up a PR with the change and tag me (@TheNeuralBit on
github).

Thanks!
Brian


[1] https://github.com/apache/beam/tree/release-2.22.0
[2] https://beam.apache.org/contribute/release-blocking/
[3] https://github.com/apache/beam/blob/master/CHANGES.md


Re: Try Beam Katas Today

2020-05-20 Thread Henry Suryawirawan
Yeah there was a recent pull request merged for the md file format change.
I checked your repo and it still contains the task.html, so need your help
to merge with the latest master.

For the answer placeholder, you may refer to this doc
 first
to understand how it works.
It will auto update the placeholder position in the task-info.yaml.

If you encounter any issue, just let me know.
Thanks Rion.


Regards,
Henry



On Wed, May 20, 2020 at 12:43 PM Rion Williams 
wrote:

> Hi Henry,
>
> Thanks for the quick response, I appreciate it. I believe that I pulled
> the latest from master a day or so ago, so I’ll make sure to pull the most
> recent changes in.
>
> As far as the placeholders, they aren’t currently present (as I don’t
> believe they were present in the Java ones within the learning/katas
> directory), however I can easily add those in to align with the content of
> the existing course. I wasn’t entirely sure based on the existing
> directories if the files should contain the placeholders or the actual
> implementations, either way, it’s a pretty trivial series of changes.
>
> I’ll try to put these together tomorrow and push up a PR. I’ll make sure
> to include you as a reviewer.
>
> Thanks for the initial feedback,
>
> Rion
>
> On May 19, 2020, at 11:15 PM, Henry Suryawirawan 
> wrote:
>
> 
> Thanks Rion for adding the Kotlin version.
> This is great to show other people that Beam can be done in Kotlin too!
>
> I can help to review your work.
> Please help to incorporate the Java Katas latest changes from master.
> There are recent changes to the task description file format from html to
> md.
> Please also help to remove all the *-remote-info.yaml files.
> I assume that you've adjusted the answer placeholders in all tasks as well.
> Afterwards, you can create a pull request and assign me as reviewer.
>
> Please reach out to me if you have any questions.
>
>
> Regards,
> Henry
>
>
>
>
> On Wed, May 20, 2020 at 3:33 AM Rion Williams 
> wrote:
>
>> Sure! I ran through all of the tests locally on my branch (as tests) and
>> then performed a check against all of the known tasks (via Course Creator >
>> Check All Tasks) and 35/36 tasks passed successfully with the only one that
>> didn't being a Built-in IO one that doesn't currently have any
>> implementation. Although, I'd love for someone else to try the same thing
>> since as far as I can tell it "works on my machine".
>>
>> Thanks!
>>
>> Rion
>>
>> On 2020/05/19 19:12:57, Pablo Estrada  wrote:
>> > This is really cool Rion!
>> >
>> > I believe it's possible to start trying out the katas from your branch?
>> If
>> > so, I can give them a try, and use that as a review...
>> > Henry, any other ideas?
>> >
>> > On Tue, May 19, 2020 at 12:04 PM Rion Williams 
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > I was recently added as a contributor and created a JIRA ticket
>> related to
>> > > the existing Katas (https://issues.apache.org/jira/browse/BEAM-10027
>> ),
>> > > specifically creating one that targets Kotlin specific as there are
>> quite a
>> > > few existing examples out there for Kotlin, so I thought a Kata
>> course that
>> > > would parallel the existing Java, Go, and Python ones.
>> > >
>> > > I basically ported over the existing Java Katas, added the appropriate
>> > > dependencies, and converted all of the Java files over to Kotlin, and
>> > > ensured that all of the tests pass as expected. I'd love outside of
>> this to
>> > > see if we can shift it to a Stepik course as well if that seems
>> reasonable
>> > > similar to those mentioned in this thread.
>> > >
>> > > My current branch awaiting a PR can be found here (
>> > > https://github.com/rionmonster/beam/tree/BEAM-10027), however I'm
>> unsure
>> > > who would be the best to review such a PR and what other steps might
>> need
>> > > to be taken before trying to get it merged in.
>> > >
>> > > Any feedback would be welcome!
>> > >
>> > > Thanks,
>> > >
>> > > Rion
>> > >
>> > > On 2020/05/14 23:40:45, Rion Williams  wrote:
>> > > > +1 on the contributions front. My team and I have been working with
>> Beam
>> > > primarily with Kotlin and I recently added the appropriate
>> dependencies to
>> > > Gradle and performed a bit of conversions and have it working as
>> expected
>> > > against the existing Java course.
>> > > >
>> > > > I don’t know how many others are actively working with Kotlin and
>> Beam,
>> > > but I’d love to work on transitioning that into a proper course
>> (assuming
>> > > there’s interest in it).
>> > > >
>> > > > > On May 14, 2020, at 10:32 AM, Nathan Fisher <
>> nfis...@junctionbox.ca>
>> > > wrote:
>> > > > >
>> > > > > 
>> > > > > Yes write IO
>> > > > >
>> > > > >> On Thu, May 14, 2020 at 05:41, Henry Suryawirawan <
>> > > hsuryawira...@google.com> wrote:
>> > > > >> Yeah certainly we can expand it further.
>> > > > >> There are more lessons that definitely can be added further.
>> > > > 

Re: Discussion on Project Idea for Session Of docs 2020

2020-05-20 Thread Divya Sanghi
Follow up ,can anyone please reply

On Tue, 19 May 2020, 18:02 Divya Sanghi,  wrote:

> Hello Aizhamal,
>
> I am working on Big Data technologies and has hands-on experience on
> Flink, Spark, Kafka
> and also did POC where I created Docker image of Fink job and ran it on
> K8S cluster on the local machine.
>
> Attaching my POC project: https://github.com/sanghisha145/flink_on_k8s
>
> I really find this project " Deployment of a Flink and Spark Clusters
> with Portable Beam " interesting and feel that I can contribute
> whole-heartedly to documentation on it.
>
> Let me know where I can start?
>
> PS:  I have not written any open documentation but has written many for my
> organization(created many technical articles on confluence page)
>
> Thanks
> Divya
>