Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread rahul patwari
Hi Reza, Rui,

Can we use [slowly changing lookup cache] approach if the source is [HDFS
(or) HIVE] (data is changing), where the PCollection cannot fit into Memory
in BeamSQL?
This PCollection will be JOINED with Windowed PCollection Created from
Reading data in Kafka in BeamSQL.

Thanks and Regards,
Rahul

On Wed, Jul 17, 2019 at 3:07 AM Reza Rokni  wrote:

> +1
>
> On Tue, 16 Jul 2019 at 20:36, Rui Wang  wrote:
>
>> Another approach is to let BeamSQL support it natively, as the title of
>> this thread says: "as a Table in BeamSQL".
>>
>> We might be able to define a table with properties that says this table
>> return a PCollectionView. By doing so we will have a trigger based
>> PCollectionView available in SQL rel nodes, thus SQL will be able to
>> implement [*Pattern: Slowly-changing lookup cache].* By this way, users
>> only need to construct a table and set it to SqlTransform
>> 
>> *. *
>>
>> Create a JIRA to track this idea:
>> https://jira.apache.org/jira/browse/BEAM-7758
>>
>>
>> -Rui
>>
>>
>> On Tue, Jul 16, 2019 at 7:12 AM Reza Rokni  wrote:
>>
>>> Hi Rahul,
>>>
>>> FYI, that patterns is also available in the Beam docs  ( with updated
>>> code example )
>>> https://beam.apache.org/documentation/patterns/side-input-patterns/.
>>>
>>> Please note in the DoFn that feeds the View.asSingleton() you will need
>>> to manually call BigQuery using the BigQuery client.
>>>
>>> Regards
>>>
>>> Reza
>>>
>>> On Tue, 16 Jul 2019 at 14:37, rahul patwari 
>>> wrote:
>>>
 Hi,

 we are following [*Pattern: Slowly-changing lookup cache*] from
 https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1

 We have a use case to read slowly changing bounded data as a
 PCollection along with the main PCollection from Kafka(windowed) and use it
 in the query of BeamSql.

 Is it possible to design such a use case with Beam Java SDK?

 Approaches followed but not Successful:

 1) GenerateSequence => GlobalWindow with Data Trigger => Composite
 Transform(which applies Beam I/O on the
 pipeline[PCollection.getPipeline()]) => Convert the resulting PCollection
 to PCollection Apply BeamSQL
 Comments: Beam I/O reads data only once even though a long value is
 generated from GenerateSequece with periodicity. The expectation is that
 whenever a long value is generated, Beam I/O will be used to read the
 latest data. Is this because of optimizations in the DAG? Can the
 optimizations be overridden?

 2) The pipeline is the same as approach 1. But, instead of using a
 composite transform, a DoFn is used where a for loop will emit each Row of
 the PCollection.
 comments: The output PCollection is unbounded. But, we need a bounded
 PCollection as this PCollection is used to JOIN with PCollection of each
 window from Kafka. How can we convert an Unbounded PCollection to Bounded
 PCollection inside a DoFn?

 Are there any better Approaches?

 Regards,
 Rahul



>>>
>>> --
>>>
>>> This email may be confidential and privileged. If you received this
>>> communication by mistake, please don't forward it to anyone else, please
>>> erase all copies and attachments, and please let me know that it has gone
>>> to the wrong person.
>>>
>>> The above terms reflect a potential business arrangement, are provided
>>> solely as a basis for further discussion, and are not intended to be and do
>>> not constitute a legally binding obligation. No legally binding obligations
>>> will be created, implied, or inferred until an agreement in final form is
>>> executed in writing by all parties involved.
>>>
>>
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Rakesh Kumar
Congrats Rob!!!

On Tue, Jul 16, 2019 at 10:24 AM Ahmet Altay  wrote:

> Hi,
>
> Please join me and the rest of the Beam PMC in welcoming a new committer: 
> Robert
> Burke.
>
> Robert has been contributing to Beam and actively involved in the
> community for over a year. He has been actively working on Go SDK, helping
> users, and making it easier for others to contribute [1].
>
> In consideration of Robert's contributions, the Beam PMC trusts him with
> the responsibilities of a Beam committer [2].
>
> Thank you, Robert, for your contributions and looking forward to many more!
>
> Ahmet, on behalf of the Apache Beam PMC
>
> [1]
> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-
> committer
>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Hannah Jiang
Congratulations, Rebo!

> On Jul 16, 2019, at 5:14 PM, Connell O'Callaghan  wrote:
> 
> Excellent - congratulations Rebo 
> 
>> On Tue, Jul 16, 2019 at 4:38 PM Tanay Tummalapalli  
>> wrote:
>> Congratulations!
>> 
>>> On Wed, Jul 17, 2019 at 3:27 AM Aizhamal Nurmamat kyzy 
>>>  wrote:
>>> Congratulations, Rebo
>>> 
 On Tue, Jul 16, 2019 at 1:34 PM Chamikara Jayalath  
 wrote:
 Congrats!!
 
> On Tue, Jul 16, 2019 at 1:31 PM Robin Qiu  wrote:
> Congrats, Robert!!
> 
>> On Tue, Jul 16, 2019 at 1:22 PM Alan Myrvold  wrote:
>> Congrats, Robert!
>> 
>>> On Tue, Jul 16, 2019 at 11:46 AM Ismaël Mejía  wrote:
>>> Congrats Robert!
>>> 
>>> 
>>> On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang  wrote:
>>> >
>>> > Congratulations!
>>> >
>>> > On Tue, Jul 16, 2019 at 10:51 AM Holden Karau  
>>> > wrote:
>>> >>
>>> >> Congratulations! :)
>>> >>
>>> >> On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin 
>>> >>  wrote:
>>> >>>
>>> >>> Congratulations!
>>> >>>
>>> >>> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka  
>>> >>> wrote:
>>> 
>>>  Congratulations Robert!
>>> 
>>>  Go GO!
>>> 
>>>  On Tue, Jul 16, 2019 at 10:34 AM Rui Wang  
>>>  wrote:
>>> >
>>> > Congrats!
>>> >
>>> >
>>> > -Rui
>>> >
>>> > On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri  
>>> > wrote:
>>> >>
>>> >> Congrats Robert B.!
>>> >>
>>> >> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  
>>> >> wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> Please join me and the rest of the Beam PMC in welcoming a new 
>>> >>> committer: Robert Burke.
>>> >>>
>>> >>> Robert has been contributing to Beam and actively involved in 
>>> >>> the community for over a year. He has been actively working on 
>>> >>> Go SDK, helping users, and making it easier for others to 
>>> >>> contribute [1].
>>> >>>
>>> >>> In consideration of Robert's contributions, the Beam PMC trusts 
>>> >>> him with the responsibilities of a Beam committer [2].
>>> >>>
>>> >>> Thank you, Robert, for your contributions and looking forward 
>>> >>> to many more!
>>> >>>
>>> >>> Ahmet, on behalf of the Apache Beam PMC
>>> >>>
>>> >>> [1] 
>>> >>> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
>>> >>> [2] 
>>> >>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Twitter: https://twitter.com/holdenkarau
>>> >> Books (Learning Spark, High Performance Spark, etc.): 
>>> >> https://amzn.to/2MaRAG9
>>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Connell O'Callaghan
Excellent - congratulations Rebo

On Tue, Jul 16, 2019 at 4:38 PM Tanay Tummalapalli 
wrote:

> Congratulations!
>
> On Wed, Jul 17, 2019 at 3:27 AM Aizhamal Nurmamat kyzy <
> aizha...@google.com> wrote:
>
>> Congratulations, Rebo
>>
>> On Tue, Jul 16, 2019 at 1:34 PM Chamikara Jayalath 
>> wrote:
>>
>>> Congrats!!
>>>
>>> On Tue, Jul 16, 2019 at 1:31 PM Robin Qiu  wrote:
>>>
 Congrats, Robert!!

 On Tue, Jul 16, 2019 at 1:22 PM Alan Myrvold 
 wrote:

> Congrats, Robert!
>
> On Tue, Jul 16, 2019 at 11:46 AM Ismaël Mejía 
> wrote:
>
>> Congrats Robert!
>>
>>
>> On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang 
>> wrote:
>> >
>> > Congratulations!
>> >
>> > On Tue, Jul 16, 2019 at 10:51 AM Holden Karau 
>> wrote:
>> >>
>> >> Congratulations! :)
>> >>
>> >> On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin <
>> mig...@google.com> wrote:
>> >>>
>> >>> Congratulations!
>> >>>
>> >>> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka 
>> wrote:
>> 
>>  Congratulations Robert!
>> 
>>  Go GO!
>> 
>>  On Tue, Jul 16, 2019 at 10:34 AM Rui Wang 
>> wrote:
>> >
>> > Congrats!
>> >
>> >
>> > -Rui
>> >
>> > On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri 
>> wrote:
>> >>
>> >> Congrats Robert B.!
>> >>
>> >> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay 
>> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Please join me and the rest of the Beam PMC in welcoming a
>> new committer: Robert Burke.
>> >>>
>> >>> Robert has been contributing to Beam and actively involved in
>> the community for over a year. He has been actively working on Go SDK,
>> helping users, and making it easier for others to contribute [1].
>> >>>
>> >>> In consideration of Robert's contributions, the Beam PMC
>> trusts him with the responsibilities of a Beam committer [2].
>> >>>
>> >>> Thank you, Robert, for your contributions and looking forward
>> to many more!
>> >>>
>> >>> Ahmet, on behalf of the Apache Beam PMC
>> >>>
>> >>> [1]
>> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
>> >>> [2]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>> >>
>> >>
>> >>
>> >> --
>> >> Twitter: https://twitter.com/holdenkarau
>> >> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Tanay Tummalapalli
Congratulations!

On Wed, Jul 17, 2019 at 3:27 AM Aizhamal Nurmamat kyzy 
wrote:

> Congratulations, Rebo
>
> On Tue, Jul 16, 2019 at 1:34 PM Chamikara Jayalath 
> wrote:
>
>> Congrats!!
>>
>> On Tue, Jul 16, 2019 at 1:31 PM Robin Qiu  wrote:
>>
>>> Congrats, Robert!!
>>>
>>> On Tue, Jul 16, 2019 at 1:22 PM Alan Myrvold 
>>> wrote:
>>>
 Congrats, Robert!

 On Tue, Jul 16, 2019 at 11:46 AM Ismaël Mejía 
 wrote:

> Congrats Robert!
>
>
> On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang  wrote:
> >
> > Congratulations!
> >
> > On Tue, Jul 16, 2019 at 10:51 AM Holden Karau 
> wrote:
> >>
> >> Congratulations! :)
> >>
> >> On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin <
> mig...@google.com> wrote:
> >>>
> >>> Congratulations!
> >>>
> >>> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka 
> wrote:
> 
>  Congratulations Robert!
> 
>  Go GO!
> 
>  On Tue, Jul 16, 2019 at 10:34 AM Rui Wang 
> wrote:
> >
> > Congrats!
> >
> >
> > -Rui
> >
> > On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri 
> wrote:
> >>
> >> Congrats Robert B.!
> >>
> >> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Robert Burke.
> >>>
> >>> Robert has been contributing to Beam and actively involved in
> the community for over a year. He has been actively working on Go SDK,
> helping users, and making it easier for others to contribute [1].
> >>>
> >>> In consideration of Robert's contributions, the Beam PMC
> trusts him with the responsibilities of a Beam committer [2].
> >>>
> >>> Thank you, Robert, for your contributions and looking forward
> to many more!
> >>>
> >>> Ahmet, on behalf of the Apache Beam PMC
> >>>
> >>> [1]
> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
> >>> [2]
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
> >>
> >>
> >>
> >> --
> >> Twitter: https://twitter.com/holdenkarau
> >> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>



Re: [DISCUSS] Thoughts on stateful DoFns in merging windows

2019-07-16 Thread Kenneth Knowles
In actual practice, Combine + GBK are actually implemented using the same
underlying code as user-facing state & timers. So you are very right :-). I
think the use cases are very different, and the requirements to be more
"safe by default" for users.

The way we could do timer merging if we treated it the same as state is to
have the user provide a TimestampCombiner or CombineFn for merging the
timers. I don't really like this. I agree with Reuven that for stateful
DoFn it seems cleaner to just have @OnMerge.

Under the hood, StateInternals has automatic merging for CombiningState and
BagState. I had always thought that eventually we might expose this to user
state. But again I'm not so sure it is the right design. State is not that
useful without timers, and once you write @OnMerge for the timers you
probably can merge the state too, otherwise it will get confusing.

And as a last point, once we have retractions (which are moving again a
little bit, yay!) it is very likely that a stateful DoFn will have a reason
to emit retractions from @OnMerge. So doing anything automatically has even
less benefit.

Kenn

On Tue, Jul 2, 2019 at 1:27 AM Jan Lukavský  wrote:

> Understood. I think I can put down a design document of API changes that
> would be required to implement this. Unfortunately, I don't have capacity
> to implement this myself (mostly because I actually don't have a use case
> for that). This discussion was meant to be a kind of theoretical exercise
> if we can find benefits of viewing Combine and GroupByKey as a special case
> of stateful DoFns. I think there are two direct consequences of that:
>
>  a) it might be a little easier to implement new runner (as it might
> suffice to implement stateless and stateful ParDos to get "at least
> something up and running")
>
>  b) it opens some optimizations - e.g. using combineByKey in batch to
> implement stateful (unordered) pardo, probably even on streaming it would
> be possible to push some calculations before the shuffle (timers would
> trigger state shuffle and merge downstream)
>
> Maybe we can find even more benefits. Would this be something worth
> exploring? Would anyone be interested in this?
>
> Jan
> On 7/1/19 11:51 PM, Reuven Lax wrote:
>
> The problem is that it's not obvious how to merge timers. For any obvious
> strategy, I think I can imagine a use case where it will be wrong.
>
> I'm leaning to the conclusion that we're much better off providing an
> onMerge callback, and letting the user explicitly handle the merging there.
>
> Reuven
>
> On Mon, Jul 1, 2019 at 1:04 AM Jan Lukavský  wrote:
>
>> What are the issues you see with merging timers? Does it "only" suffer
>> from the same issue as early emitting from merging windows (i.e. needs
>> retractions to work correctly), or are there some other issues?
>> On 6/28/19 3:48 PM, Reuven Lax wrote:
>>
>> We do have merging state. However merging timers is a bit more awkward. I
>> now tend to think that we're better off providing an onMerge function and
>> let the user handle this.
>>
>> On Fri, Jun 28, 2019, 11:06 AM Jan Lukavský  wrote:
>>
>>> Hi,
>>>
>>> during my implementation of @RequiresTimeSortedInput I found out, that
>>> current runners do not support stateful DoFns on merging windows [1].
>>> The question is why is that? One obvious reason seems to be, that
>>> current definition of StateSpec doesn't define a state merge function,
>>> which is necessary for merging windows to work. Actually, on Euphoria
>>> [2] (originated separately, now merged in Beam) we had only two basic
>>> operators (PTransforms) - FlatMap (stateless ParDo) and an operation we
>>> called ReduceStateByKey [3] (not merged into Beam, as there were some
>>> difficulties, one of which might be the missing support for merging
>>> windows). All the others operations could be derived from these two. The
>>> ReduceStateByKey (RSBK) operator was keyed operator (shuffle) with
>>> following user defined function:
>>>
>>>   - state factory (roughly equivalent to declaring a StateSpec)
>>>
>>>   - state merge function (State1 + State2 = State3)
>>>
>>>   - state update function (State + Value = new State)
>>>
>>>   - and state extract function (State -> Output) -- actually called
>>> after each element was added to the state
>>>
>>> Now if you look at this, this is essentially both Beam's Combine
>>> operator and stateful ParDo. Moreove, GroupByKey is just RSBK with
>>> BagState and append merge function. So, a question that come to mind,
>>> what would happen if we add state merge function to StateSpec? I think
>>> it can have the following benefits:
>>>
>>>   - Both Combine and GroupByKey can become derived operators (this is no
>>> breaking change, as runners are always free to provide their override to
>>> any PTransform)
>>>
>>>   - in batch, stateful ParDo can now be implemented more efficiently,
>>> using Combine operation (as long, as doesn't @RequiresTimeSortedInput,
>>> which is my favourite :))
>>>
>>>   - 

Re: Using the BigQuery Storage API

2019-07-16 Thread Rui Wang
Hi,

I have a fresh repo cloned and switch to release-2.13.0. I tried to add
"import org.apache.beam.sdk.Pipeline" to BigQueryTornadoes.java

and succeeded(I am using intellij)

I am thinking you might not setup your IDE right.


If you are using intellij (and BTW intellij has a community version which
is free and good enough for Beam). There was a doc developed by people in
Beam community on how to setup intellij [1].


[1]:
https://docs.google.com/document/d/18eXrO9IYll4oOnFb53EBhOtIfx-JLOinTWZSIBFkLk4/edit


-Rui

On Mon, Jul 15, 2019 at 9:01 PM john desmond 
wrote:

> Hello,
>
> I am trying to follow the tutorial for reading data from a Big Query table
> using java, specifically using the BigQuery Storage API. I am using the
> tutorial found here:
> https://beam.apache.org/documentation/io/built-in/google-bigquery/
>
> The tutorial suggests using the 2.13.0 version of beam, but many of the
> suggested import statements from the source code:
> https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java
>
> are not supported with this version of beam. For example I am getting
> errors in my IDE by trying to use import statements like,
>
> import org.apache.beam.sdk.Pipeline;
>
> I'm not sure how to resolve these statements because the only
> documentation I can find is from Beam 2.0.0, e.g.
> https://beam.apache.org/releases/javadoc/2.0.0/org/apache/beam/sdk/Pipeline.html
>
>
> Any suggestions on how to resolve these kinds of issues?
>
> Thank you!
>
> John M. Desmond
> (631) 833-2836
> https://www.linkedin.com/in/yohn-dezmon/
> https://github.com/yohn-dezmon
>
>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Aizhamal Nurmamat kyzy
Congratulations, Rebo

On Tue, Jul 16, 2019 at 1:34 PM Chamikara Jayalath 
wrote:

> Congrats!!
>
> On Tue, Jul 16, 2019 at 1:31 PM Robin Qiu  wrote:
>
>> Congrats, Robert!!
>>
>> On Tue, Jul 16, 2019 at 1:22 PM Alan Myrvold  wrote:
>>
>>> Congrats, Robert!
>>>
>>> On Tue, Jul 16, 2019 at 11:46 AM Ismaël Mejía  wrote:
>>>
 Congrats Robert!


 On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang  wrote:
 >
 > Congratulations!
 >
 > On Tue, Jul 16, 2019 at 10:51 AM Holden Karau 
 wrote:
 >>
 >> Congratulations! :)
 >>
 >> On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin <
 mig...@google.com> wrote:
 >>>
 >>> Congratulations!
 >>>
 >>> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka 
 wrote:
 
  Congratulations Robert!
 
  Go GO!
 
  On Tue, Jul 16, 2019 at 10:34 AM Rui Wang 
 wrote:
 >
 > Congrats!
 >
 >
 > -Rui
 >
 > On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri 
 wrote:
 >>
 >> Congrats Robert B.!
 >>
 >> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay 
 wrote:
 >>>
 >>> Hi,
 >>>
 >>> Please join me and the rest of the Beam PMC in welcoming a new
 committer: Robert Burke.
 >>>
 >>> Robert has been contributing to Beam and actively involved in
 the community for over a year. He has been actively working on Go SDK,
 helping users, and making it easier for others to contribute [1].
 >>>
 >>> In consideration of Robert's contributions, the Beam PMC trusts
 him with the responsibilities of a Beam committer [2].
 >>>
 >>> Thank you, Robert, for your contributions and looking forward
 to many more!
 >>>
 >>> Ahmet, on behalf of the Apache Beam PMC
 >>>
 >>> [1]
 https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
 >>> [2]
 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
 >>
 >>
 >>
 >> --
 >> Twitter: https://twitter.com/holdenkarau
 >> Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9
 >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Heejong Lee
Congratulations!

On Tue, Jul 16, 2019 at 1:34 PM Chamikara Jayalath 
wrote:

> Congrats!!
>
> On Tue, Jul 16, 2019 at 1:31 PM Robin Qiu  wrote:
>
>> Congrats, Robert!!
>>
>> On Tue, Jul 16, 2019 at 1:22 PM Alan Myrvold  wrote:
>>
>>> Congrats, Robert!
>>>
>>> On Tue, Jul 16, 2019 at 11:46 AM Ismaël Mejía  wrote:
>>>
 Congrats Robert!


 On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang  wrote:
 >
 > Congratulations!
 >
 > On Tue, Jul 16, 2019 at 10:51 AM Holden Karau 
 wrote:
 >>
 >> Congratulations! :)
 >>
 >> On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin <
 mig...@google.com> wrote:
 >>>
 >>> Congratulations!
 >>>
 >>> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka 
 wrote:
 
  Congratulations Robert!
 
  Go GO!
 
  On Tue, Jul 16, 2019 at 10:34 AM Rui Wang 
 wrote:
 >
 > Congrats!
 >
 >
 > -Rui
 >
 > On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri 
 wrote:
 >>
 >> Congrats Robert B.!
 >>
 >> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay 
 wrote:
 >>>
 >>> Hi,
 >>>
 >>> Please join me and the rest of the Beam PMC in welcoming a new
 committer: Robert Burke.
 >>>
 >>> Robert has been contributing to Beam and actively involved in
 the community for over a year. He has been actively working on Go SDK,
 helping users, and making it easier for others to contribute [1].
 >>>
 >>> In consideration of Robert's contributions, the Beam PMC trusts
 him with the responsibilities of a Beam committer [2].
 >>>
 >>> Thank you, Robert, for your contributions and looking forward
 to many more!
 >>>
 >>> Ahmet, on behalf of the Apache Beam PMC
 >>>
 >>> [1]
 https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
 >>> [2]
 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
 >>
 >>
 >>
 >> --
 >> Twitter: https://twitter.com/holdenkarau
 >> Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9
 >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>


Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread Reza Rokni
+1

On Tue, 16 Jul 2019 at 20:36, Rui Wang  wrote:

> Another approach is to let BeamSQL support it natively, as the title of
> this thread says: "as a Table in BeamSQL".
>
> We might be able to define a table with properties that says this table
> return a PCollectionView. By doing so we will have a trigger based
> PCollectionView available in SQL rel nodes, thus SQL will be able to
> implement [*Pattern: Slowly-changing lookup cache].* By this way, users
> only need to construct a table and set it to SqlTransform
> 
> *. *
>
> Create a JIRA to track this idea:
> https://jira.apache.org/jira/browse/BEAM-7758
>
>
> -Rui
>
>
> On Tue, Jul 16, 2019 at 7:12 AM Reza Rokni  wrote:
>
>> Hi Rahul,
>>
>> FYI, that patterns is also available in the Beam docs  ( with updated
>> code example )
>> https://beam.apache.org/documentation/patterns/side-input-patterns/.
>>
>> Please note in the DoFn that feeds the View.asSingleton() you will need
>> to manually call BigQuery using the BigQuery client.
>>
>> Regards
>>
>> Reza
>>
>> On Tue, 16 Jul 2019 at 14:37, rahul patwari 
>> wrote:
>>
>>> Hi,
>>>
>>> we are following [*Pattern: Slowly-changing lookup cache*] from
>>> https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1
>>>
>>> We have a use case to read slowly changing bounded data as a PCollection
>>> along with the main PCollection from Kafka(windowed) and use it in the
>>> query of BeamSql.
>>>
>>> Is it possible to design such a use case with Beam Java SDK?
>>>
>>> Approaches followed but not Successful:
>>>
>>> 1) GenerateSequence => GlobalWindow with Data Trigger => Composite
>>> Transform(which applies Beam I/O on the
>>> pipeline[PCollection.getPipeline()]) => Convert the resulting PCollection
>>> to PCollection Apply BeamSQL
>>> Comments: Beam I/O reads data only once even though a long value is
>>> generated from GenerateSequece with periodicity. The expectation is that
>>> whenever a long value is generated, Beam I/O will be used to read the
>>> latest data. Is this because of optimizations in the DAG? Can the
>>> optimizations be overridden?
>>>
>>> 2) The pipeline is the same as approach 1. But, instead of using a
>>> composite transform, a DoFn is used where a for loop will emit each Row of
>>> the PCollection.
>>> comments: The output PCollection is unbounded. But, we need a bounded
>>> PCollection as this PCollection is used to JOIN with PCollection of each
>>> window from Kafka. How can we convert an Unbounded PCollection to Bounded
>>> PCollection inside a DoFn?
>>>
>>> Are there any better Approaches?
>>>
>>> Regards,
>>> Rahul
>>>
>>>
>>>
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.


Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-16 Thread Eugene Kirpichov
I'd like to reiterate the request to not build anything on top of
FileBasedSource/Reader.
If the design requires having some interface for representing a function
from a filename to a stream of records, better introduce a new interface
for that.
If it requires interoperability with other IOs that read files, better
change them to use the new interface.

On Tue, Jul 16, 2019 at 9:08 AM Chamikara Jayalath 
wrote:

> Thanks this clarifies a lot.
>
> For writer, I think it's great if you can utilize existing FileIO.Sink
> implementations even if you have to reimplement some of the logic (for
> example compression, temp file handling) that is already implemented in
> Beam FileIO/WriteFiles transforms in your SMB sink transform.
>
> For reader, you are right that there's no FileIO.Read. What we have are
> various implementations of FileBasedSource/FileBasedReader classes that are
> currently intentionally hidden since Beam IO transforms are expected to be
> the intended public interface for users. If you can expose and re-use these
> classes with slight modifications (keeping backwards compatibility) I'm OK
> with it. Otherwise you'll have to write your own reader implementations.
>
> In general, seems like SMB has very strong requirements related to
> sharding/hot-key management that are not easily achievable by implementing
> SMB source/sink as a composite transform that utilizes existing source/sink
> transforms. This forces you to implement this logic in your own DoFns and
> existing Beam primitives are not easily re-usable in this context.
>
> Thanks,
> Cham
>
> On Tue, Jul 16, 2019 at 8:26 AM Neville Li  wrote:
>
>> A little clarification of the IO requirement and my understanding of the
>> current state of IO.
>>
>> tl;dr: not sure if there're reusable bits for the reader. It's possible
>> to reuse some for the writer but with heavy refactoring.
>>
>> *Reader*
>>
>>- For each bucket (containing the same key partition, sorted) across
>>multiple input data sets, we stream records from bucket files and merge
>>sort.
>>- We open the files in a DoFn, and emit KV where the
>>CGBKR encapsulates Iterable from each input.
>>- Basically we need a simple API like ResourceId -> Iterator, i.e.
>>sequential read, no block/offset/split requirement.
>>- FileBasedSource.FileBasedReader seems the closest fit but they're
>>nested & decoupled.
>>- There's no FileIO.Read, only a ReadMatches[1], which can be used
>>with ReadAllViaFileBasedSource. But that's not the granularity we need,
>>since we lose ordering of the input records, and can't merge 2+ sources.
>>
>> *Writer*
>>
>>- We get a `PCollection>` after bucket and
>>and sort, where Iterable is the records sorted by key and BucketShardId
>>is used to produce filename, e.g. bucket-1-shard-2.avro.
>>- We write each Iterable to a temp file and move to final
>>destination when done. Both should ideally reuse existing code.
>>- Looks like FileIO.Sink (and impls in AvroIO, TextIO, TFRecordIO)
>>supports record writing into a WritableByteChannel, but some logic like
>>compression is handled in FileIO through ViaFileBasedSink which extends
>>FileBasedSink.
>>- FileIO uses WriteFiles[3] to shard and write of PCollection.
>>Again we lose ordering of the output records or custom file naming scheme.
>>However, WriteShardsIntoTempFilesFn[4] and FinalizeTempFileBundles[5] in
>>WriteFiles seem closest to our need but would have to be split out and
>>generalized.
>>
>> *Note on reader block/offset/split requirement*
>>
>>- Because of the merge sort, we can't split or offset seek a bucket
>>file. Because without persisting the offset index of a key group 
>> somewhere,
>>we can't efficiently skip to a key group without exhausting the previous
>>ones. Furthermore we need to merge sort and align keys from multiple
>>sources, which may not have the same key distribution. It might be 
>> possible
>>to binary search for matching keys but that's extra complication. IMO the
>>reader work distribution is better solved by better bucket/shard strategy
>>in upstream writer.
>>
>> *References*
>>
>>1. ReadMatches extends PTransform,
>>PCollection>
>>2. ReadAllViaFileBasedSource extends
>>PTransform, PCollection>
>>3. WriteFiles extends
>>PTransform, WriteFilesResult>
>>4. WriteShardsIntoTempFilesFn extends DoFn,
>>Iterable>, FileResult>
>>5. FinalizeTempFileBundles extends PTransform<
>>PCollection>>, 
>> WriteFilesResult>
>>
>>
>> On Tue, Jul 16, 2019 at 5:15 AM Robert Bradshaw 
>> wrote:
>>
>>> On Mon, Jul 15, 2019 at 7:03 PM Eugene Kirpichov 
>>> wrote:
>>> >
>>> > Quick note: I didn't look through the document, but please do not
>>> build on either FileBasedSink or FileBasedReader. They are both remnants of
>>> the old, non-composable IO world; and in fact much of the composable IO
>>> work emerged from 

[RESULT] [VOTE] Vendored Dependencies Release

2019-07-16 Thread Lukasz Cwik
I'm happy to announce that we have unanimously approved this release.

There are 4 approving votes, 3 of which are binding:
* Ismaël Mejía
* Lukasz Cwik
* Pablo Estrada

There are no disapproving votes.

Thanks everyone!

On Tue, Jul 16, 2019 at 4:30 AM Ismaël Mejía  wrote:

> +1
>
> Run build and used diffoscope [1] to compare the generated vs staged
> files. We should probably in the future make the full gradle build
> reproducible to make this comparison trivial (a simple diff).
>
> [1] https://diffoscope.org/
>
> On Tue, Jul 16, 2019 at 2:18 AM Lukasz Cwik  wrote:
> >
> > +1
> >
> > On Mon, Jul 15, 2019 at 8:14 PM Pablo Estrada 
> wrote:
> >>
> >> +1
> >> verified hashes and signatures
> >>
> >> On Fri, Jul 12, 2019 at 9:40 AM Kai Jiang  wrote:
> >>>
> >>> +1 (non-binding)
> >>>
> >>> On Thu, Jul 11, 2019 at 8:27 PM Lukasz Cwik  wrote:
> 
>  Please review the release of the following artifacts that we vendor:
>   * beam-vendor-grpc_1_21_0
>   * beam-vendor-guava-26_0-jre
>   * beam-vendor-bytebuddy-1_9_3
> 
>  Hi everyone,
>  Please review and vote on the release candidate #3 for the
> org.apache.beam:beam-vendor-grpc_1_21_0:0.1,
> org.apache.beam:beam-vendor-guava-26_0-jre:0.1, and
> org.apache.beam:beam-vendor-bytebuddy-1_9_3:0.1 as follows:
>  [ ] +1, Approve the release
>  [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
>  The complete staging area is available for your review, which
> includes:
>  * the official Apache source release to be deployed to
> dist.apache.org [1], which is signed with the key with fingerprint
> EAD5DE293F4A03DD2E77565589E68A56E371CCA2 [2],
>  * all artifacts to be deployed to the Maven Central Repository [3],
>  * commit hash "0fce2b88660f52dae638697e1472aa108c982ae6" [4],
> 
>  The vote will be open for at least 72 hours. It is adopted by
> majority approval, with at least 3 PMC affirmative votes.
> 
>  Thanks,
>  Luke
> 
>  [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
>  [2] https://dist.apache.org/repos/dist/release/beam/KEYS
>  [3]
> https://repository.apache.org/content/repositories/orgapachebeam-1078/
>  [4]
> https://github.com/apache/beam/commit/0fce2b88660f52dae638697e1472aa108c982ae6
>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Chamikara Jayalath
Congrats!!

On Tue, Jul 16, 2019 at 1:31 PM Robin Qiu  wrote:

> Congrats, Robert!!
>
> On Tue, Jul 16, 2019 at 1:22 PM Alan Myrvold  wrote:
>
>> Congrats, Robert!
>>
>> On Tue, Jul 16, 2019 at 11:46 AM Ismaël Mejía  wrote:
>>
>>> Congrats Robert!
>>>
>>>
>>> On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang  wrote:
>>> >
>>> > Congratulations!
>>> >
>>> > On Tue, Jul 16, 2019 at 10:51 AM Holden Karau 
>>> wrote:
>>> >>
>>> >> Congratulations! :)
>>> >>
>>> >> On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin 
>>> wrote:
>>> >>>
>>> >>> Congratulations!
>>> >>>
>>> >>> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka 
>>> wrote:
>>> 
>>>  Congratulations Robert!
>>> 
>>>  Go GO!
>>> 
>>>  On Tue, Jul 16, 2019 at 10:34 AM Rui Wang 
>>> wrote:
>>> >
>>> > Congrats!
>>> >
>>> >
>>> > -Rui
>>> >
>>> > On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri 
>>> wrote:
>>> >>
>>> >> Congrats Robert B.!
>>> >>
>>> >> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay 
>>> wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> Please join me and the rest of the Beam PMC in welcoming a new
>>> committer: Robert Burke.
>>> >>>
>>> >>> Robert has been contributing to Beam and actively involved in
>>> the community for over a year. He has been actively working on Go SDK,
>>> helping users, and making it easier for others to contribute [1].
>>> >>>
>>> >>> In consideration of Robert's contributions, the Beam PMC trusts
>>> him with the responsibilities of a Beam committer [2].
>>> >>>
>>> >>> Thank you, Robert, for your contributions and looking forward to
>>> many more!
>>> >>>
>>> >>> Ahmet, on behalf of the Apache Beam PMC
>>> >>>
>>> >>> [1]
>>> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
>>> >>> [2]
>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Twitter: https://twitter.com/holdenkarau
>>> >> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Robin Qiu
Congrats, Robert!!

On Tue, Jul 16, 2019 at 1:22 PM Alan Myrvold  wrote:

> Congrats, Robert!
>
> On Tue, Jul 16, 2019 at 11:46 AM Ismaël Mejía  wrote:
>
>> Congrats Robert!
>>
>>
>> On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang  wrote:
>> >
>> > Congratulations!
>> >
>> > On Tue, Jul 16, 2019 at 10:51 AM Holden Karau 
>> wrote:
>> >>
>> >> Congratulations! :)
>> >>
>> >> On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin 
>> wrote:
>> >>>
>> >>> Congratulations!
>> >>>
>> >>> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka 
>> wrote:
>> 
>>  Congratulations Robert!
>> 
>>  Go GO!
>> 
>>  On Tue, Jul 16, 2019 at 10:34 AM Rui Wang  wrote:
>> >
>> > Congrats!
>> >
>> >
>> > -Rui
>> >
>> > On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri 
>> wrote:
>> >>
>> >> Congrats Robert B.!
>> >>
>> >> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay 
>> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Robert Burke.
>> >>>
>> >>> Robert has been contributing to Beam and actively involved in the
>> community for over a year. He has been actively working on Go SDK, helping
>> users, and making it easier for others to contribute [1].
>> >>>
>> >>> In consideration of Robert's contributions, the Beam PMC trusts
>> him with the responsibilities of a Beam committer [2].
>> >>>
>> >>> Thank you, Robert, for your contributions and looking forward to
>> many more!
>> >>>
>> >>> Ahmet, on behalf of the Apache Beam PMC
>> >>>
>> >>> [1]
>> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
>> >>> [2]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>> >>
>> >>
>> >>
>> >> --
>> >> Twitter: https://twitter.com/holdenkarau
>> >> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Alan Myrvold
Congrats, Robert!

On Tue, Jul 16, 2019 at 11:46 AM Ismaël Mejía  wrote:

> Congrats Robert!
>
>
> On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang  wrote:
> >
> > Congratulations!
> >
> > On Tue, Jul 16, 2019 at 10:51 AM Holden Karau 
> wrote:
> >>
> >> Congratulations! :)
> >>
> >> On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin 
> wrote:
> >>>
> >>> Congratulations!
> >>>
> >>> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka 
> wrote:
> 
>  Congratulations Robert!
> 
>  Go GO!
> 
>  On Tue, Jul 16, 2019 at 10:34 AM Rui Wang  wrote:
> >
> > Congrats!
> >
> >
> > -Rui
> >
> > On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri  wrote:
> >>
> >> Congrats Robert B.!
> >>
> >> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Robert Burke.
> >>>
> >>> Robert has been contributing to Beam and actively involved in the
> community for over a year. He has been actively working on Go SDK, helping
> users, and making it easier for others to contribute [1].
> >>>
> >>> In consideration of Robert's contributions, the Beam PMC trusts
> him with the responsibilities of a Beam committer [2].
> >>>
> >>> Thank you, Robert, for your contributions and looking forward to
> many more!
> >>>
> >>> Ahmet, on behalf of the Apache Beam PMC
> >>>
> >>> [1]
> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
> >>> [2]
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
> >>
> >>
> >>
> >> --
> >> Twitter: https://twitter.com/holdenkarau
> >> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread Rui Wang
Another approach is to let BeamSQL support it natively, as the title of
this thread says: "as a Table in BeamSQL".

We might be able to define a table with properties that says this table
return a PCollectionView. By doing so we will have a trigger based
PCollectionView available in SQL rel nodes, thus SQL will be able to
implement [*Pattern: Slowly-changing lookup cache].* By this way, users
only need to construct a table and set it to SqlTransform

*. *

Create a JIRA to track this idea:
https://jira.apache.org/jira/browse/BEAM-7758


-Rui


On Tue, Jul 16, 2019 at 7:12 AM Reza Rokni  wrote:

> Hi Rahul,
>
> FYI, that patterns is also available in the Beam docs  ( with updated code
> example )
> https://beam.apache.org/documentation/patterns/side-input-patterns/.
>
> Please note in the DoFn that feeds the View.asSingleton() you will need to
> manually call BigQuery using the BigQuery client.
>
> Regards
>
> Reza
>
> On Tue, 16 Jul 2019 at 14:37, rahul patwari 
> wrote:
>
>> Hi,
>>
>> we are following [*Pattern: Slowly-changing lookup cache*] from
>> https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1
>>
>> We have a use case to read slowly changing bounded data as a PCollection
>> along with the main PCollection from Kafka(windowed) and use it in the
>> query of BeamSql.
>>
>> Is it possible to design such a use case with Beam Java SDK?
>>
>> Approaches followed but not Successful:
>>
>> 1) GenerateSequence => GlobalWindow with Data Trigger => Composite
>> Transform(which applies Beam I/O on the
>> pipeline[PCollection.getPipeline()]) => Convert the resulting PCollection
>> to PCollection Apply BeamSQL
>> Comments: Beam I/O reads data only once even though a long value is
>> generated from GenerateSequece with periodicity. The expectation is that
>> whenever a long value is generated, Beam I/O will be used to read the
>> latest data. Is this because of optimizations in the DAG? Can the
>> optimizations be overridden?
>>
>> 2) The pipeline is the same as approach 1. But, instead of using a
>> composite transform, a DoFn is used where a for loop will emit each Row of
>> the PCollection.
>> comments: The output PCollection is unbounded. But, we need a bounded
>> PCollection as this PCollection is used to JOIN with PCollection of each
>> window from Kafka. How can we convert an Unbounded PCollection to Bounded
>> PCollection inside a DoFn?
>>
>> Are there any better Approaches?
>>
>> Regards,
>> Rahul
>>
>>
>>
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>


Re: Docker Run Options in SDK Container

2019-07-16 Thread Ankur Goenka
Thanks for summarizing the discussion.

A few comments inline below:


On Mon, Jul 15, 2019 at 5:28 PM Sam Bourne  wrote:

> Hello Beam devs,
>
> I’ve opened a PR (https://github.com/apache/beam/pull/8982) to support
> passing options/flags to the docker run command executed as part of the
> portable environment workflow. I’m in need of providing specific volumes
> and possibly other docker run options as I refine our custom container and
> workflow.
>
> There were requests to bring this up in the mailing list to discuss
> possible ways to achieve this. There’s an existing PR #8828
>  but we took quite different
> approaches. #8828 is limited to only mounting /tmp/ directories with no
> support for other docker run options/flags so wouldn’t solve my needs.
>
> I chose to expand upon the existing flag environment_config and provide
> the additional docker run options there. This requires the SDK parse these
> out when building the DockerPayload protobuf. It’s worth noting that what
> is provided to environment_config changes depending on the
> environment_type. e.g. if environment_type is docker, environment_config
> is currently expected to be the docker container name, but other
> environment types have completely different expectations, and each uses its
> own protobuf message type.
>
> The current method (using python SDK) looks like this:
>
> python -m mymodule —runner PortableRunner —job_endpoint localhost:8099 
> —environment_type DOCKER —environment_config MY_CONTAINER_NAME
>
> My PR expects other run options to be provided before the container name -
> similar to how you would start the container locally:
>
> python -m mymodule —runner PortableRunner —job_endpoint localhost:8099 
> —environment_type DOCKER —environment_config “-v 
> /Volumes/mnt/foo:/Volumes/mnt/foo -v /Volumes/mnt/bar:/Volumes/mnt/bar —user 
> sambvfx MY_CONTAINER_NAME”
>
> The PR’s feedback raises some questions that some of you may have opinions
> about. A hopefully faithful summary of them and my commentary below:
>
> Should we require the environment_config be a json encoded string that
> mirrors the protobuf?
>
> e.g.
>
> --environment_config '{"image_name": "MY_CONTAINER_NAME", "run_options": “-v 
> /Volumes/mnt/foo:/Volumes/mnt/foo -v /Volumes/mnt/bar:/Volumes/mnt/bar —user 
> sambvfx"}'
>
> I’m not a fan due to it not being backwards compatible and difficult to
> provide to CLI. Users don’t want to type json into the shell.
>
I agree, typing JSON on command line is really messy. But I think having
meaningful parts in the config will be easier to maintain and compare.
Can we give a config file which can be read, parsed and delivered as
options to the docker environment.
Something like "--environment_config '~/my_docker_config.json/yaml'"

I think passing a user provided command to start docker might have security
issues as users might load mount an otherwise non accessible drive or
access prohibited port etc.

> Should we not assume docker run ... is the only way to start the
> container?
>
> I think any other method would likely require further changes to the
> protobuf or a completely new one.
>
Yes I think that makes sense. However, if we add more parameters to the
docker startup then the dockerpayload protobuf can be updated to have
those.

> Should we provide different args for mounting volume(s) and map that to
> the appropriate docker command within the beam code?
>
> This requires a lot of docker specific code to be included within beam.
>
> Any input would be appreciated.
>
> Cheers,
> Sam
>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Ismaël Mejía
Congrats Robert!


On Tue, Jul 16, 2019 at 8:19 PM Yichi Zhang  wrote:
>
> Congratulations!
>
> On Tue, Jul 16, 2019 at 10:51 AM Holden Karau  wrote:
>>
>> Congratulations! :)
>>
>> On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin  wrote:
>>>
>>> Congratulations!
>>>
>>> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka  wrote:

 Congratulations Robert!

 Go GO!

 On Tue, Jul 16, 2019 at 10:34 AM Rui Wang  wrote:
>
> Congrats!
>
>
> -Rui
>
> On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri  wrote:
>>
>> Congrats Robert B.!
>>
>> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  wrote:
>>>
>>> Hi,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a new 
>>> committer: Robert Burke.
>>>
>>> Robert has been contributing to Beam and actively involved in the 
>>> community for over a year. He has been actively working on Go SDK, 
>>> helping users, and making it easier for others to contribute [1].
>>>
>>> In consideration of Robert's contributions, the Beam PMC trusts him 
>>> with the responsibilities of a Beam committer [2].
>>>
>>> Thank you, Robert, for your contributions and looking forward to many 
>>> more!
>>>
>>> Ahmet, on behalf of the Apache Beam PMC
>>>
>>> [1] 
>>> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
>>> [2] 
>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Yichi Zhang
Congratulations!

On Tue, Jul 16, 2019 at 10:51 AM Holden Karau  wrote:

> Congratulations! :)
>
> On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin 
> wrote:
>
>> Congratulations!
>>
>> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka  wrote:
>>
>>> Congratulations Robert!
>>>
>>> Go GO!
>>>
>>> On Tue, Jul 16, 2019 at 10:34 AM Rui Wang  wrote:
>>>
 Congrats!


 -Rui

 On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri  wrote:

> Congrats Robert B.!
>
> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  wrote:
>
>> Hi,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Robert Burke.
>>
>> Robert has been contributing to Beam and actively involved in the
>> community for over a year. He has been actively working on Go SDK, 
>> helping
>> users, and making it easier for others to contribute [1].
>>
>> In consideration of Robert's contributions, the Beam PMC trusts him
>> with the responsibilities of a Beam committer [2].
>>
>> Thank you, Robert, for your contributions and looking forward to many
>> more!
>>
>> Ahmet, on behalf of the Apache Beam PMC
>>
>> [1]
>> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
>> [2] https://beam.apache.org/contribute/become-a-committer
>> /#an-apache-beam-committer
>>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-16 Thread Yichi Zhang
Thanks for organizing this Pablo, it'll be very helpful!

On Tue, Jul 16, 2019 at 10:57 AM Pablo Estrada  wrote:

> Hello all,
> I'll be having a session where I live-fix a Beam bug for 1 hour next week.
> Everyone is invited.
>
> It will be on July 25, between 3:30pm and 4:30pm PST. Hopefully I will
> finish a full change in that time frame, but we'll see.
>
> I have not yet decided if I will do this via hangouts, or via a youtube
> livestream. In any case, I will share the link here in the next few days.
>
> I will most likely work on the Java SDK (I have a little feature request
> in mind).
>
> Thanks!
> -P.
>


Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-16 Thread Pablo Estrada
Hello all,
I'll be having a session where I live-fix a Beam bug for 1 hour next week.
Everyone is invited.

It will be on July 25, between 3:30pm and 4:30pm PST. Hopefully I will
finish a full change in that time frame, but we'll see.

I have not yet decided if I will do this via hangouts, or via a youtube
livestream. In any case, I will share the link here in the next few days.

I will most likely work on the Java SDK (I have a little feature request in
mind).

Thanks!
-P.


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Holden Karau
Congratulations! :)

On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin 
wrote:

> Congratulations!
>
> On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka  wrote:
>
>> Congratulations Robert!
>>
>> Go GO!
>>
>> On Tue, Jul 16, 2019 at 10:34 AM Rui Wang  wrote:
>>
>>> Congrats!
>>>
>>>
>>> -Rui
>>>
>>> On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri  wrote:
>>>
 Congrats Robert B.!

 On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  wrote:

> Hi,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Robert Burke.
>
> Robert has been contributing to Beam and actively involved in the
> community for over a year. He has been actively working on Go SDK, helping
> users, and making it easier for others to contribute [1].
>
> In consideration of Robert's contributions, the Beam PMC trusts him
> with the responsibilities of a Beam committer [2].
>
> Thank you, Robert, for your contributions and looking forward to many
> more!
>
> Ahmet, on behalf of the Apache Beam PMC
>
> [1]
> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/contribute/become-a-committer
> /#an-apache-beam-committer
>


-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Mikhail Gryzykhin
Congratulations!

On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka  wrote:

> Congratulations Robert!
>
> Go GO!
>
> On Tue, Jul 16, 2019 at 10:34 AM Rui Wang  wrote:
>
>> Congrats!
>>
>>
>> -Rui
>>
>> On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri  wrote:
>>
>>> Congrats Robert B.!
>>>
>>> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  wrote:
>>>
 Hi,

 Please join me and the rest of the Beam PMC in welcoming a new
 committer: Robert Burke.

 Robert has been contributing to Beam and actively involved in the
 community for over a year. He has been actively working on Go SDK, helping
 users, and making it easier for others to contribute [1].

 In consideration of Robert's contributions, the Beam PMC trusts him
 with the responsibilities of a Beam committer [2].

 Thank you, Robert, for your contributions and looking forward to many
 more!

 Ahmet, on behalf of the Apache Beam PMC

 [1]
 https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
 [2] https://beam.apache.org/contribute/become-a-committer
 /#an-apache-beam-committer

>>>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Anton Kedin
Congrats!

On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka  wrote:

> Congratulations Robert!
>
> Go GO!
>
> On Tue, Jul 16, 2019 at 10:34 AM Rui Wang  wrote:
>
>> Congrats!
>>
>>
>> -Rui
>>
>> On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri  wrote:
>>
>>> Congrats Robert B.!
>>>
>>> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  wrote:
>>>
 Hi,

 Please join me and the rest of the Beam PMC in welcoming a new
 committer: Robert Burke.

 Robert has been contributing to Beam and actively involved in the
 community for over a year. He has been actively working on Go SDK, helping
 users, and making it easier for others to contribute [1].

 In consideration of Robert's contributions, the Beam PMC trusts him
 with the responsibilities of a Beam committer [2].

 Thank you, Robert, for your contributions and looking forward to many
 more!

 Ahmet, on behalf of the Apache Beam PMC

 [1]
 https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
 [2] https://beam.apache.org/contribute/become-a-committer
 /#an-apache-beam-committer

>>>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Ankur Goenka
Congratulations Robert!

Go GO!

On Tue, Jul 16, 2019 at 10:34 AM Rui Wang  wrote:

> Congrats!
>
>
> -Rui
>
> On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri  wrote:
>
>> Congrats Robert B.!
>>
>> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  wrote:
>>
>>> Hi,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a new committer: 
>>> Robert
>>> Burke.
>>>
>>> Robert has been contributing to Beam and actively involved in the
>>> community for over a year. He has been actively working on Go SDK, helping
>>> users, and making it easier for others to contribute [1].
>>>
>>> In consideration of Robert's contributions, the Beam PMC trusts him with
>>> the responsibilities of a Beam committer [2].
>>>
>>> Thank you, Robert, for your contributions and looking forward to many
>>> more!
>>>
>>> Ahmet, on behalf of the Apache Beam PMC
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
>>> [2] https://beam.apache.org/contribute/become-a-committer
>>> /#an-apache-beam-committer
>>>
>>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Ruoyun Huang
Congratulations !

On Tue, Jul 16, 2019 at 10:24 AM Ahmet Altay  wrote:

> Hi,
>
> Please join me and the rest of the Beam PMC in welcoming a new committer: 
> Robert
> Burke.
>
> Robert has been contributing to Beam and actively involved in the
> community for over a year. He has been actively working on Go SDK, helping
> users, and making it easier for others to contribute [1].
>
> In consideration of Robert's contributions, the Beam PMC trusts him with
> the responsibilities of a Beam committer [2].
>
> Thank you, Robert, for your contributions and looking forward to many more!
>
> Ahmet, on behalf of the Apache Beam PMC
>
> [1]
> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-
> committer
>


-- 

Ruoyun  Huang


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Yifan Zou
Congratulations!!

On Tue, Jul 16, 2019 at 10:34 AM Rui Wang  wrote:

> Congrats!
>
>
> -Rui
>
> On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri  wrote:
>
>> Congrats Robert B.!
>>
>> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  wrote:
>>
>>> Hi,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a new committer: 
>>> Robert
>>> Burke.
>>>
>>> Robert has been contributing to Beam and actively involved in the
>>> community for over a year. He has been actively working on Go SDK, helping
>>> users, and making it easier for others to contribute [1].
>>>
>>> In consideration of Robert's contributions, the Beam PMC trusts him with
>>> the responsibilities of a Beam committer [2].
>>>
>>> Thank you, Robert, for your contributions and looking forward to many
>>> more!
>>>
>>> Ahmet, on behalf of the Apache Beam PMC
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
>>> [2] https://beam.apache.org/contribute/become-a-committer
>>> /#an-apache-beam-committer
>>>
>>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Rui Wang
Congrats!


-Rui

On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri  wrote:

> Congrats Robert B.!
>
> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  wrote:
>
>> Hi,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new committer: 
>> Robert
>> Burke.
>>
>> Robert has been contributing to Beam and actively involved in the
>> community for over a year. He has been actively working on Go SDK, helping
>> users, and making it easier for others to contribute [1].
>>
>> In consideration of Robert's contributions, the Beam PMC trusts him with
>> the responsibilities of a Beam committer [2].
>>
>> Thank you, Robert, for your contributions and looking forward to many
>> more!
>>
>> Ahmet, on behalf of the Apache Beam PMC
>>
>> [1]
>> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
>> [2] https://beam.apache.org/contribute/become-a-committer
>> /#an-apache-beam-committer
>>
>


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Udi Meiri
Congrats Robert B.!

On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  wrote:

> Hi,
>
> Please join me and the rest of the Beam PMC in welcoming a new committer: 
> Robert
> Burke.
>
> Robert has been contributing to Beam and actively involved in the
> community for over a year. He has been actively working on Go SDK, helping
> users, and making it easier for others to contribute [1].
>
> In consideration of Robert's contributions, the Beam PMC trusts him with
> the responsibilities of a Beam committer [2].
>
> Thank you, Robert, for your contributions and looking forward to many more!
>
> Ahmet, on behalf of the Apache Beam PMC
>
> [1]
> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-
> committer
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Reza Rokni
Congratulations !

On Tue, 16 Jul 2019, 18:24 Ahmet Altay,  wrote:

> Hi,
>
> Please join me and the rest of the Beam PMC in welcoming a new committer: 
> Robert
> Burke.
>
> Robert has been contributing to Beam and actively involved in the
> community for over a year. He has been actively working on Go SDK, helping
> users, and making it easier for others to contribute [1].
>
> In consideration of Robert's contributions, the Beam PMC trusts him with
> the responsibilities of a Beam committer [2].
>
> Thank you, Robert, for your contributions and looking forward to many more!
>
> Ahmet, on behalf of the Apache Beam PMC
>
> [1]
> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-
> committer
>


[ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Ahmet Altay
Hi,

Please join me and the rest of the Beam PMC in welcoming a new
committer: Robert
Burke.

Robert has been contributing to Beam and actively involved in the community
for over a year. He has been actively working on Go SDK, helping users, and
making it easier for others to contribute [1].

In consideration of Robert's contributions, the Beam PMC trusts him with
the responsibilities of a Beam committer [2].

Thank you, Robert, for your contributions and looking forward to many more!

Ahmet, on behalf of the Apache Beam PMC

[1]
https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
[2] https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-
committer


Re: Write-through-cache in State logic

2019-07-16 Thread Thomas Weise
Thanks for the pointer. For streaming, it will be important to support
caching across bundles. It appears that even the Java SDK doesn't support
that yet?

https://github.com/apache/beam/blob/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java#L221

Regarding clear/append: It would be nice if both could occur within a
single Fn Api roundtrip when the state is persisted.

Thanks,
Thomas



On Tue, Jul 16, 2019 at 6:58 AM Lukasz Cwik  wrote:

> User state is built on top of read, append and clear and not off a read
> and write paradigm to allow for blind appends.
>
> The optimization you speak of can be done completely inside the SDK
> without any additional protocol being required as long as you clear the
> state first and then append all your new data. The Beam Java SDK does this
> for all runners when executed portably[1]. You could port the same logic to
> the Beam Python SDK as well.
>
> 1:
> https://github.com/apache/beam/blob/41478d00d34598e56471d99d0845ac16efa5b8ef/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BagUserState.java#L84
>
> On Tue, Jul 16, 2019 at 5:54 AM Robert Bradshaw 
> wrote:
>
>> Python workers also have a per-bundle SDK-side cache. A protocol has
>> been proposed, but hasn't yet been implemented in any SDKs or runners.
>>
>> On Tue, Jul 16, 2019 at 6:02 AM Reuven Lax  wrote:
>> >
>> > It's runner dependent. Some runners (e.g. the Dataflow runner) do have
>> such a cache, though I think it's currently has a cap for large bags.
>> >
>> > Reuven
>> >
>> > On Mon, Jul 15, 2019 at 8:48 PM Rakesh Kumar 
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have been using python sdk for the application and also using
>> BagState in production. I was wondering whether state logic has any
>> write-through-cache implemented or not. If we are sending every read and
>> write request through network then it comes with a performance cost. We can
>> avoid network call for a read operation if we have write-through-cache.
>> >> I have superficially looked into the implementation and I didn't see
>> any cache implementation.
>> >>
>> >> is it possible to have this cache? would it cause any issue if we have
>> the caching layer?
>> >>
>>
>


Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-16 Thread Chamikara Jayalath
Thanks this clarifies a lot.

For writer, I think it's great if you can utilize existing FileIO.Sink
implementations even if you have to reimplement some of the logic (for
example compression, temp file handling) that is already implemented in
Beam FileIO/WriteFiles transforms in your SMB sink transform.

For reader, you are right that there's no FileIO.Read. What we have are
various implementations of FileBasedSource/FileBasedReader classes that are
currently intentionally hidden since Beam IO transforms are expected to be
the intended public interface for users. If you can expose and re-use these
classes with slight modifications (keeping backwards compatibility) I'm OK
with it. Otherwise you'll have to write your own reader implementations.

In general, seems like SMB has very strong requirements related to
sharding/hot-key management that are not easily achievable by implementing
SMB source/sink as a composite transform that utilizes existing source/sink
transforms. This forces you to implement this logic in your own DoFns and
existing Beam primitives are not easily re-usable in this context.

Thanks,
Cham

On Tue, Jul 16, 2019 at 8:26 AM Neville Li  wrote:

> A little clarification of the IO requirement and my understanding of the
> current state of IO.
>
> tl;dr: not sure if there're reusable bits for the reader. It's possible to
> reuse some for the writer but with heavy refactoring.
>
> *Reader*
>
>- For each bucket (containing the same key partition, sorted) across
>multiple input data sets, we stream records from bucket files and merge
>sort.
>- We open the files in a DoFn, and emit KV where the
>CGBKR encapsulates Iterable from each input.
>- Basically we need a simple API like ResourceId -> Iterator, i.e.
>sequential read, no block/offset/split requirement.
>- FileBasedSource.FileBasedReader seems the closest fit but they're
>nested & decoupled.
>- There's no FileIO.Read, only a ReadMatches[1], which can be used
>with ReadAllViaFileBasedSource. But that's not the granularity we need,
>since we lose ordering of the input records, and can't merge 2+ sources.
>
> *Writer*
>
>- We get a `PCollection>` after bucket and
>and sort, where Iterable is the records sorted by key and BucketShardId
>is used to produce filename, e.g. bucket-1-shard-2.avro.
>- We write each Iterable to a temp file and move to final
>destination when done. Both should ideally reuse existing code.
>- Looks like FileIO.Sink (and impls in AvroIO, TextIO, TFRecordIO)
>supports record writing into a WritableByteChannel, but some logic like
>compression is handled in FileIO through ViaFileBasedSink which extends
>FileBasedSink.
>- FileIO uses WriteFiles[3] to shard and write of PCollection.
>Again we lose ordering of the output records or custom file naming scheme.
>However, WriteShardsIntoTempFilesFn[4] and FinalizeTempFileBundles[5] in
>WriteFiles seem closest to our need but would have to be split out and
>generalized.
>
> *Note on reader block/offset/split requirement*
>
>- Because of the merge sort, we can't split or offset seek a bucket
>file. Because without persisting the offset index of a key group somewhere,
>we can't efficiently skip to a key group without exhausting the previous
>ones. Furthermore we need to merge sort and align keys from multiple
>sources, which may not have the same key distribution. It might be possible
>to binary search for matching keys but that's extra complication. IMO the
>reader work distribution is better solved by better bucket/shard strategy
>in upstream writer.
>
> *References*
>
>1. ReadMatches extends PTransform,
>PCollection>
>2. ReadAllViaFileBasedSource extends
>PTransform, PCollection>
>3. WriteFiles extends
>PTransform, WriteFilesResult>
>4. WriteShardsIntoTempFilesFn extends DoFn,
>Iterable>, FileResult>
>5. FinalizeTempFileBundles extends PTransform<
>PCollection>>, 
> WriteFilesResult>
>
>
> On Tue, Jul 16, 2019 at 5:15 AM Robert Bradshaw 
> wrote:
>
>> On Mon, Jul 15, 2019 at 7:03 PM Eugene Kirpichov 
>> wrote:
>> >
>> > Quick note: I didn't look through the document, but please do not build
>> on either FileBasedSink or FileBasedReader. They are both remnants of the
>> old, non-composable IO world; and in fact much of the composable IO work
>> emerged from frustration with their limitations and recognizing that many
>> other IOs were suffering from the same limitations.
>> > Instead of FileBasedSink, build on FileIO.write; instead of
>> FileBasedReader, build on FileIO.read.
>>
>> +1
>>
>> I think the sink could be written atop FileIO.write, possibly using
>> dynamic destinations. At the very least the FileSink interface, which
>> handles the details of writing a single shard, would be an ideal way
>> to parameterize an SMB sink. It seems that none of our existing IOs
>> (publically?) expose 

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-16 Thread Neville Li
A little clarification of the IO requirement and my understanding of the
current state of IO.

tl;dr: not sure if there're reusable bits for the reader. It's possible to
reuse some for the writer but with heavy refactoring.

*Reader*

   - For each bucket (containing the same key partition, sorted) across
   multiple input data sets, we stream records from bucket files and merge
   sort.
   - We open the files in a DoFn, and emit KV where the
   CGBKR encapsulates Iterable from each input.
   - Basically we need a simple API like ResourceId -> Iterator, i.e.
   sequential read, no block/offset/split requirement.
   - FileBasedSource.FileBasedReader seems the closest fit but they're
   nested & decoupled.
   - There's no FileIO.Read, only a ReadMatches[1], which can be used with
   ReadAllViaFileBasedSource. But that's not the granularity we need, since
   we lose ordering of the input records, and can't merge 2+ sources.

*Writer*

   - We get a `PCollection>` after bucket and
   and sort, where Iterable is the records sorted by key and BucketShardId
   is used to produce filename, e.g. bucket-1-shard-2.avro.
   - We write each Iterable to a temp file and move to final destination
   when done. Both should ideally reuse existing code.
   - Looks like FileIO.Sink (and impls in AvroIO, TextIO, TFRecordIO)
   supports record writing into a WritableByteChannel, but some logic like
   compression is handled in FileIO through ViaFileBasedSink which extends
   FileBasedSink.
   - FileIO uses WriteFiles[3] to shard and write of PCollection. Again
   we lose ordering of the output records or custom file naming scheme.
   However, WriteShardsIntoTempFilesFn[4] and FinalizeTempFileBundles[5] in
   WriteFiles seem closest to our need but would have to be split out and
   generalized.

*Note on reader block/offset/split requirement*

   - Because of the merge sort, we can't split or offset seek a bucket
   file. Because without persisting the offset index of a key group somewhere,
   we can't efficiently skip to a key group without exhausting the previous
   ones. Furthermore we need to merge sort and align keys from multiple
   sources, which may not have the same key distribution. It might be possible
   to binary search for matching keys but that's extra complication. IMO the
   reader work distribution is better solved by better bucket/shard strategy
   in upstream writer.

*References*

   1. ReadMatches extends PTransform,
   PCollection>
   2. ReadAllViaFileBasedSource extends
   PTransform, PCollection>
   3. WriteFiles extends
   PTransform, WriteFilesResult>
   4. WriteShardsIntoTempFilesFn extends DoFn,
   Iterable>, FileResult>
   5. FinalizeTempFileBundles extends PTransform<
   PCollection>>, WriteFilesResult>


On Tue, Jul 16, 2019 at 5:15 AM Robert Bradshaw  wrote:

> On Mon, Jul 15, 2019 at 7:03 PM Eugene Kirpichov 
> wrote:
> >
> > Quick note: I didn't look through the document, but please do not build
> on either FileBasedSink or FileBasedReader. They are both remnants of the
> old, non-composable IO world; and in fact much of the composable IO work
> emerged from frustration with their limitations and recognizing that many
> other IOs were suffering from the same limitations.
> > Instead of FileBasedSink, build on FileIO.write; instead of
> FileBasedReader, build on FileIO.read.
>
> +1
>
> I think the sink could be written atop FileIO.write, possibly using
> dynamic destinations. At the very least the FileSink interface, which
> handles the details of writing a single shard, would be an ideal way
> to parameterize an SMB sink. It seems that none of our existing IOs
> (publically?) expose FileSink implementations.
>
> FileIO.read is not flexible enough to do the merging. Eugene, is there
> a composable analogue to FileSink, for sources, i.e. something that
> can turn a file handle (possibly with offsets) into a set of records
> other than FileBasedReader?
>
> > On Mon, Jul 15, 2019 at 9:01 AM Gleb Kanterov  wrote:
> >>
> >> I share the same concern with Robert regarding re-implementing parts of
> IO. At the same time, in the past, I worked on internal libraries that try
> to re-use code from existing IO, and it's hardly possible because it feels
> like it wasn't designed for re-use. There are a lot of classes that are
> nested (non-static) or non-public. I can understand why they were made
> non-public, it's a hard abstraction to design well and keep compatibility.
> As Neville mentioned, decoupling readers and writers would not only benefit
> for this proposal but for any other use-case that has to deal with
> low-level API such as FileSystem API, that is hardly possible today without
> copy-pasting,
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Jul 15, 2019 at 5:05 PM Neville Li 
> wrote:
> >>>
> >>> Re: avoiding mirroring IO functionality, what about:
> >>>
> >>> - Decouple the nested FileBasedSink.Writer and
> FileBasedSource.FileBasedReader, make them top level and remove references
> to parent 

Re: [Python] Read Hadoop Sequence File?

2019-07-16 Thread Shannon Duncan
I am still having the problem that local file system (DirectRunner) will
not allow a local GLOB string to be passed as a file source. I have tried
both relative path and fully qualified paths.

I can confirm the same inputFile source GLOB returns data on a simple cat
command. So I know the GLOB is good.

Error: "java.io.FileNotFoundException: No files matched spec:
/Users//github//io/sequenceFile/part-*/data

Any assistance would be greatly appreciated. This is on the Java SDK.

I tested this with TextIO.read().from(ValueProvider); Still the
same.

Thanks,
Shannon

On Fri, Jul 12, 2019 at 2:14 PM Igor Bernstein 
wrote:

> I'm not sure to be honest. The pattern expansion happens in
> FileBasedSource via FileSystems.match(), so it should follow the same
> expansion rules other file based sinks like TextIO. Maybe someone with more
> beam experience can help?
>
> On Fri, Jul 12, 2019 at 2:55 PM Shannon Duncan 
> wrote:
>
>> Clarification on previous message. Only happens on local file system
>> where it is unable to match a pattern string. Via a `gs://` link it
>> is able to do multiple file matching.
>>
>> On Fri, Jul 12, 2019 at 1:36 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> Awesome. I got it working for a single file, but for a structure of:
>>>
>>> /part-0001/index
>>> /part-0001/data
>>> /part-0002/index
>>> /part-0002/data
>>>
>>> I tried to do /part-*  and /part-*/data
>>>
>>> It does not find the multipart files. However if I just do
>>> /part-0001/data it will find it and read it.
>>>
>>> Any ideas why?
>>>
>>> I am using this to generate the source:
>>>
>>> static SequenceFileSource createSource(
>>> ValueProvider sourcePattern) {
>>> return new SequenceFileSource(
>>> sourcePattern,
>>> Text.class,
>>> WritableSerialization.class,
>>> Text.class,
>>> WritableSerialization.class,
>>> SequenceFile.SYNC_INTERVAL);
>>> }
>>>
>>> On Wed, Jul 10, 2019 at 10:52 AM Igor Bernstein <
>>> igorbernst...@google.com> wrote:
>>>
 It should be fairly straight forward:
 1. Copy SequenceFileSource.java
 
  to
 your project
 2. Add the source to your pipeline, configuring it with appropriate
 serializers. See here
 
 for an example for hbase Results

 On Wed, Jul 10, 2019 at 10:58 AM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> If I wanted to go ahead and include this within a new Java Pipeline,
> what would I be looking at for level of work to integrate?
>
> On Wed, Jul 3, 2019 at 3:54 AM Ismaël Mejía  wrote:
>
>> That's great. I can help whenever you need. We just need to choose its
>> destination. Both the `hadoop-format` and `hadoop-file-system` modules
>> are good candidates, I would even feel inclined to put it in its own
>> module `sdks/java/extensions/sequencefile` to make it more easy to
>> discover by the final users.
>>
>> A thing to consider is the SeekableByteChannel adapters, we can move
>> that into hadoop-common if needed and refactor the modules to share
>> code. Worth to take a look at
>>
>> org.apache.beam.sdk.io.hdfs.HadoopFileSystem.HadoopSeekableByteChannel#HadoopSeekableByteChannel
>> to see if some of it could be useful.
>>
>> On Tue, Jul 2, 2019 at 11:46 PM Igor Bernstein <
>> igorbernst...@google.com> wrote:
>> >
>> > Hi all,
>> >
>> > I wrote those classes with the intention of upstreaming them to
>> Beam. I can try to make some time this quarter to clean them up. I would
>> need a bit of guidance from a beam expert in how to make them coexist 
>> with
>> HadoopFormatIO though.
>> >
>> >
>> > On Tue, Jul 2, 2019 at 10:55 AM Solomon Duskis 
>> wrote:
>> >>
>> >> +Igor Bernstein who wrote the Cloud Bigtable Sequence File classes.
>> >>
>> >> Solomon Duskis | Google Cloud clients | sdus...@google.com |
>> 914-462-0531
>> >>
>> >>
>> >> On Tue, Jul 2, 2019 at 4:57 AM Ismaël Mejía 
>> wrote:
>> >>>
>> >>> (Adding dev@ and Solomon Duskis to the discussion)
>> >>>
>> >>> I was not aware of these thanks for sharing David. Definitely it
>> would
>> >>> be a great addition if we could have those donated as an
>> extension in
>> >>> the Beam side. We can even evolve them in the future to be more
>> FileIO
>> >>> like. Any chance this can happen? Maybe Solomon and his team?
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Jul 2, 2019 at 9:39 AM David Morávek 
>> wrote:
>> >>> >
>> >>> > Hi, you can use SequenceFileSink 

Re: Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread Reza Rokni
Hi Rahul,

FYI, that patterns is also available in the Beam docs  ( with updated code
example )
https://beam.apache.org/documentation/patterns/side-input-patterns/.

Please note in the DoFn that feeds the View.asSingleton() you will need to
manually call BigQuery using the BigQuery client.

Regards

Reza

On Tue, 16 Jul 2019 at 14:37, rahul patwari 
wrote:

> Hi,
>
> we are following [*Pattern: Slowly-changing lookup cache*] from
> https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1
>
> We have a use case to read slowly changing bounded data as a PCollection
> along with the main PCollection from Kafka(windowed) and use it in the
> query of BeamSql.
>
> Is it possible to design such a use case with Beam Java SDK?
>
> Approaches followed but not Successful:
>
> 1) GenerateSequence => GlobalWindow with Data Trigger => Composite
> Transform(which applies Beam I/O on the
> pipeline[PCollection.getPipeline()]) => Convert the resulting PCollection
> to PCollection Apply BeamSQL
> Comments: Beam I/O reads data only once even though a long value is
> generated from GenerateSequece with periodicity. The expectation is that
> whenever a long value is generated, Beam I/O will be used to read the
> latest data. Is this because of optimizations in the DAG? Can the
> optimizations be overridden?
>
> 2) The pipeline is the same as approach 1. But, instead of using a
> composite transform, a DoFn is used where a for loop will emit each Row of
> the PCollection.
> comments: The output PCollection is unbounded. But, we need a bounded
> PCollection as this PCollection is used to JOIN with PCollection of each
> window from Kafka. How can we convert an Unbounded PCollection to Bounded
> PCollection inside a DoFn?
>
> Are there any better Approaches?
>
> Regards,
> Rahul
>
>
>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.


Re: Write-through-cache in State logic

2019-07-16 Thread Lukasz Cwik
User state is built on top of read, append and clear and not off a read and
write paradigm to allow for blind appends.

The optimization you speak of can be done completely inside the SDK without
any additional protocol being required as long as you clear the state first
and then append all your new data. The Beam Java SDK does this for all
runners when executed portably[1]. You could port the same logic to the
Beam Python SDK as well.

1:
https://github.com/apache/beam/blob/41478d00d34598e56471d99d0845ac16efa5b8ef/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BagUserState.java#L84

On Tue, Jul 16, 2019 at 5:54 AM Robert Bradshaw  wrote:

> Python workers also have a per-bundle SDK-side cache. A protocol has
> been proposed, but hasn't yet been implemented in any SDKs or runners.
>
> On Tue, Jul 16, 2019 at 6:02 AM Reuven Lax  wrote:
> >
> > It's runner dependent. Some runners (e.g. the Dataflow runner) do have
> such a cache, though I think it's currently has a cap for large bags.
> >
> > Reuven
> >
> > On Mon, Jul 15, 2019 at 8:48 PM Rakesh Kumar 
> wrote:
> >>
> >> Hi,
> >>
> >> I have been using python sdk for the application and also using
> BagState in production. I was wondering whether state logic has any
> write-through-cache implemented or not. If we are sending every read and
> write request through network then it comes with a performance cost. We can
> avoid network call for a read operation if we have write-through-cache.
> >> I have superficially looked into the implementation and I didn't see
> any cache implementation.
> >>
> >> is it possible to have this cache? would it cause any issue if we have
> the caching layer?
> >>
>


Slowly changing lookup cache as a Table in BeamSql

2019-07-16 Thread rahul patwari
Hi,

we are following [*Pattern: Slowly-changing lookup cache*] from
https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1

We have a use case to read slowly changing bounded data as a PCollection
along with the main PCollection from Kafka(windowed) and use it in the
query of BeamSql.

Is it possible to design such a use case with Beam Java SDK?

Approaches followed but not Successful:

1) GenerateSequence => GlobalWindow with Data Trigger => Composite
Transform(which applies Beam I/O on the
pipeline[PCollection.getPipeline()]) => Convert the resulting PCollection
to PCollection Apply BeamSQL
Comments: Beam I/O reads data only once even though a long value is
generated from GenerateSequece with periodicity. The expectation is that
whenever a long value is generated, Beam I/O will be used to read the
latest data. Is this because of optimizations in the DAG? Can the
optimizations be overridden?

2) The pipeline is the same as approach 1. But, instead of using a
composite transform, a DoFn is used where a for loop will emit each Row of
the PCollection.
comments: The output PCollection is unbounded. But, we need a bounded
PCollection as this PCollection is used to JOIN with PCollection of each
window from Kafka. How can we convert an Unbounded PCollection to Bounded
PCollection inside a DoFn?

Are there any better Approaches?

Regards,
Rahul


Re: Write-through-cache in State logic

2019-07-16 Thread Robert Bradshaw
Python workers also have a per-bundle SDK-side cache. A protocol has
been proposed, but hasn't yet been implemented in any SDKs or runners.

On Tue, Jul 16, 2019 at 6:02 AM Reuven Lax  wrote:
>
> It's runner dependent. Some runners (e.g. the Dataflow runner) do have such a 
> cache, though I think it's currently has a cap for large bags.
>
> Reuven
>
> On Mon, Jul 15, 2019 at 8:48 PM Rakesh Kumar  wrote:
>>
>> Hi,
>>
>> I have been using python sdk for the application and also using BagState in 
>> production. I was wondering whether state logic has any write-through-cache 
>> implemented or not. If we are sending every read and write request through 
>> network then it comes with a performance cost. We can avoid network call for 
>> a read operation if we have write-through-cache.
>> I have superficially looked into the implementation and I didn't see any 
>> cache implementation.
>>
>> is it possible to have this cache? would it cause any issue if we have the 
>> caching layer?
>>


Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-16 Thread Robert Bradshaw
On Mon, Jul 15, 2019 at 7:03 PM Eugene Kirpichov  wrote:
>
> Quick note: I didn't look through the document, but please do not build on 
> either FileBasedSink or FileBasedReader. They are both remnants of the old, 
> non-composable IO world; and in fact much of the composable IO work emerged 
> from frustration with their limitations and recognizing that many other IOs 
> were suffering from the same limitations.
> Instead of FileBasedSink, build on FileIO.write; instead of FileBasedReader, 
> build on FileIO.read.

+1

I think the sink could be written atop FileIO.write, possibly using
dynamic destinations. At the very least the FileSink interface, which
handles the details of writing a single shard, would be an ideal way
to parameterize an SMB sink. It seems that none of our existing IOs
(publically?) expose FileSink implementations.

FileIO.read is not flexible enough to do the merging. Eugene, is there
a composable analogue to FileSink, for sources, i.e. something that
can turn a file handle (possibly with offsets) into a set of records
other than FileBasedReader?

> On Mon, Jul 15, 2019 at 9:01 AM Gleb Kanterov  wrote:
>>
>> I share the same concern with Robert regarding re-implementing parts of IO. 
>> At the same time, in the past, I worked on internal libraries that try to 
>> re-use code from existing IO, and it's hardly possible because it feels like 
>> it wasn't designed for re-use. There are a lot of classes that are nested 
>> (non-static) or non-public. I can understand why they were made non-public, 
>> it's a hard abstraction to design well and keep compatibility. As Neville 
>> mentioned, decoupling readers and writers would not only benefit for this 
>> proposal but for any other use-case that has to deal with low-level API such 
>> as FileSystem API, that is hardly possible today without copy-pasting,
>>
>>
>>
>>
>>
>> On Mon, Jul 15, 2019 at 5:05 PM Neville Li  wrote:
>>>
>>> Re: avoiding mirroring IO functionality, what about:
>>>
>>> - Decouple the nested FileBasedSink.Writer and 
>>> FileBasedSource.FileBasedReader, make them top level and remove references 
>>> to parent classes.
>>> - Simplify the interfaces, while maintaining support for block/offset read 
>>> & sequential write.
>>> - As a bonus, the refactored IO classes can be used standalone in case when 
>>> the user wants to perform custom IO in a DoFn, i.e. a 
>>> PTransform, PCollection>>. Today 
>>> this requires a lot of copy-pasted Avro boilerplate.
>>> - For compatibility, we can delegate to the new classes from the old ones 
>>> and remove them in the next breaking release.
>>>
>>> Re: WriteFiles logic, I'm not sure about generalizing it, but what about 
>>> splitting the part handling writing temp files into a new 
>>> PTransform>>, 
>>> PCollection>>? That splits the bucket-shard 
>>> logic from actual file IO.
>>>
>>> On Mon, Jul 15, 2019 at 10:27 AM Robert Bradshaw  
>>> wrote:

 I agree that generalizing the existing FileIO may not be the right
 path forward, and I'd only make their innards public with great care.
 (Would this be used like like
 SmbSink(MyFileIO.sink(parameters).getWriter[Factory]())?) SMB is a bit
 unique that the source and sink are much more coupled than other
 sources and sinks (which happen to be completely independent, if
 complementary implementations, whereas SMB attempts to be a kind of
 pipe where one half is instanciated in each pipeline).

 In short, an SMB source/sink that is parameterized by an arbitrary,
 existing IO would be ideal (but possibly not feasible (per existing
 prioritizations)), or an SMB source/sink that works as a pair. What
 I'd like to avoid is a set of parallel SMB IO classes that (partially,
 and incompletely) mirror the existing IO ones (from an API
 perspective--how much implementation it makes sense to share is an
 orthogonal issue that I'm sure can be worked out.)

 On Mon, Jul 15, 2019 at 4:18 PM Neville Li  wrote:
 >
 > Hi Robert,
 >
 > I agree, it'd be nice to reuse FileIO logic of different file types. But 
 > given the current code structure of FileIO & scope of the change, I feel 
 > it's better left for future refactor PRs.
 >
 > Some thoughts:
 > - SMB file operation is simple single file sequential reads/writes, 
 > which already exists as Writer & FileBasedReader but are private inner 
 > classes, and have references to the parent Sink/Source instance.
 > - The readers also have extra offset/split logic but that can be worked 
 > around.
 > - It'll be nice to not duplicate temp->destination file logic but again 
 > WriteFiles is assuming a single integer shard key, so it'll take some 
 > refactoring to reuse it.
 >
 > All of these can be done in backwards compatible way. OTOH generalizing 
 > the existing components too much (esp. WriteFiles, which is already 
 > complex) might lead to two 

[DISCUSS] Reconciling SetState in Java and Python

2019-07-16 Thread Rakesh Kumar
Hi,

I noticed that SetState is implemented in Java SDK but not implemented in
Python SDK. I have filed the jira ticket
and I am thinking to
implement in Python SDK.  Let me know if anyone has any concerns.
Also, feel free to pass the link of the reference documents if it has
already been discussed anywhere.

Thank you,
Rakesh