Re: Greetings from Tyson

2020-04-30 Thread Ruoyun Huang
Welcome Tyson!

On Thu, Apr 30, 2020 at 6:44 AM Connell O'Callaghan 
wrote:

> Welcome Tyson!!!
>
>
>
> On Thu, Apr 30, 2020 at 6:12 AM Ismaël Mejía  wrote:
>
>> Welcome!
>>
>> On Thu, Apr 30, 2020 at 12:27 AM Alan Myrvold 
>> wrote:
>> >
>> > Welcome, Tyson!
>> >
>> > On Wed, Apr 29, 2020 at 3:15 PM Rui Wang  wrote:
>> >>
>> >> Welcome!
>> >>
>> >> -Rui
>> >>
>> >> On Wed, Apr 29, 2020, 3:13 PM Brian Hulette 
>> wrote:
>> >>>
>> >>> Welcome Tyson!
>> >>>
>> >>> On Wed, Apr 29, 2020 at 2:54 PM Ahmet Altay  wrote:
>> 
>>  Welcome!
>> 
>>  On Tue, Apr 28, 2020 at 3:06 PM Hannah Jiang 
>> wrote:
>> >
>> > Welcome to the community!
>> >
>> >
>> > On Tue, Apr 28, 2020 at 2:45 PM Tyson Hamilton 
>> wrote:
>> >>
>> >> Hello Beam Community,
>> >>
>> >> This is just a simple 'Hello' to introduce myself. I'm a Software
>> Engineer at Google and have worked with data processing languages and
>> runtime systems on and off during my career. I now have the pleasure of
>> dedicating more time towards working with you lovely folks on Beam and I'm
>> really excited!
>> >>
>> >> I hope you're all doing well and staying safe in these difficult
>> times.
>> >>
>> >> -Tyson
>> >>
>> >>
>> >>
>>
>


Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-02-26 Thread Ruoyun Huang
I feel 4+ versions take too long to run anything.

would vote for lowest + highest,  2 versions.

On Wed, Feb 26, 2020 at 4:52 PM Udi Meiri  wrote:

> I agree with having low-frequency tests for low-priority versions.
> Low-priority versions could be determined according to least usage.
>
>
>
> On Wed, Feb 26, 2020 at 4:06 PM Robert Bradshaw 
> wrote:
>
>> On Wed, Feb 26, 2020 at 3:29 PM Kenneth Knowles  wrote:
>> >
>> > Are these divergent enough that they all need to consume testing
>> resources? For example can lower priority versions be daily runs or some
>> such?
>>
>> For the 3.x series, I think we will get the most signal out of the
>> lowest and highest version, and can get by with smoke tests +
>> infrequent post-commits for the ones between.
>>
>> > Kenn
>> >
>> > On Wed, Feb 26, 2020 at 3:25 PM Robert Bradshaw 
>> wrote:
>> >>
>> >> +1 to consulting users. Currently 3.5 downloads sit at 3.7%, or about
>> >> 20% of all Python 3 downloads.
>> >>
>> >> I would propose getting in warnings about 3.5 EoL well ahead of time,
>> >> at the very least as part of the 2.7 warning.
>> >>
>> >> Fortunately, supporting multiple 3.x versions is significantly easier
>> >> than spanning 2.7 and 3.x. I would rather not impose an ordering on
>> >> dropping 3.5 and adding 3.8 but consider their merits independently.
>> >>
>> >>
>> >> On Wed, Feb 26, 2020 at 3:16 PM Kyle Weaver 
>> wrote:
>> >> >
>> >> > 5 versions is too many IMO. We've had issues with Python precommit
>> resource usage in the past, and adding another version would surely
>> exacerbate those issues. And we have also already had to leave out certain
>> features on 3.5 [1]. Therefore, I am in favor of dropping 3.5 before adding
>> 3.8. After dropping Python 2 and adding 3.8, that will leave us with the
>> latest three minor versions (3.6, 3.7, 3.8), which I think is closer to the
>> "sweet spot." Though I would be interested in hearing if there are any
>> users who would prefer we continue supporting 3.5.
>> >> >
>> >> > [1]
>> https://github.com/apache/beam/blob/8658b95545352e51f35959f38334f3c7df8b48eb/sdks/python/apache_beam/runners/portability/flink_runner.py#L55
>> >> >
>> >> > On Wed, Feb 26, 2020 at 3:00 PM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>> >> >>
>> >> >> I would like to start a discussion about identifying a guideline
>> for answering questions like:
>> >> >>
>> >> >> 1. When will Beam support a new Python version (say, Python 3.8)?
>> >> >> 2. When will Beam drop support for an old Python version (say,
>> Python 3.5)?
>> >> >> 3. How many Python versions should we aim to support concurrently
>> (investigate issues, have continuous integration tests)?
>> >> >> 4. What comes first: adding support for a new version (3.8) or
>> deprecating older one (3.5)? This may affect the max load our test
>> infrastructure needs to sustain.
>> >> >>
>> >> >> We are already getting requests for supporting Python 3.8 and there
>> were some good reasons[1] to drop support for Python 3.5 (at least, early
>> versions of 3.5). Answering these questions would help set expectations in
>> Beam user community, Beam dev community, and  may help us establish
>> resource requirements for test infrastructure and plan efforts.
>> >> >>
>> >> >> PEP-0602 [2] establishes a yearly release cycle for Python versions
>> starting from 3.9. Each release is a long-term support release and is
>> supported for 5 years: first 1.5 years allow for general bug fix support,
>> remaining 3.5 years have security fix support.
>> >> >>
>> >> >> At every point, there may be up to 5 Python minor versions that did
>> not yet reach EOL, see "Release overlap with 12 month diagram" [3]. We can
>> try to support all of them, but that may come at a cost of velocity: we
>> will have more tests to maintain, and we will have to develop Beam against
>> a lower version for a longer period. Supporting less versions will have
>> implications for user experience. It also may be difficult to ensure
>> support of the most recent version early, since our  dependencies (e.g.
>> picklers) may not be supporting them yet.
>> >> >>
>> >> >> Currently we support 4 Python versions (2.7, 3.5, 3.6, 3.7).
>> >> >>
>> >> >> Is 4 versions a sweet spot? Too much? Too little? What do you think?
>> >> >>
>> >> >> [1]
>> https://github.com/apache/beam/pull/10821#issuecomment-590167711
>> >> >> [2] https://www.python.org/dev/peps/pep-0602/
>> >> >> [3] https://www.python.org/dev/peps/pep-0602/#id17
>>
>


Re: [ANNOUNCE] New committer: Daniel Oliveira

2019-11-20 Thread Ruoyun Huang
Congrats Daniel!

On Wed, Nov 20, 2019 at 1:58 PM Robert Burke  wrote:

> Congrats Daniel! Much deserved.
>
> On Wed, Nov 20, 2019, 12:49 PM Udi Meiri  wrote:
>
>> Congrats Daniel!
>>
>> On Wed, Nov 20, 2019 at 12:42 PM Kyle Weaver  wrote:
>>
>>> Congrats Dan! Keep up the good work :)
>>>
>>> On Wed, Nov 20, 2019 at 12:41 PM Cyrus Maden  wrote:
>>>
>>>> Congratulations! This is great news.
>>>>
>>>> On Wed, Nov 20, 2019 at 3:24 PM Rui Wang  wrote:
>>>>
>>>>> Congrats!
>>>>>
>>>>>
>>>>> -Rui
>>>>>
>>>>> On Wed, Nov 20, 2019 at 11:48 AM Valentyn Tymofieiev <
>>>>> valen...@google.com> wrote:
>>>>>
>>>>>> Congrats, Daniel!
>>>>>>
>>>>>> On Wed, Nov 20, 2019 at 11:47 AM Kenneth Knowles 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Please join me and the rest of the Beam PMC in welcoming a new
>>>>>>> committer: Daniel Oliveira
>>>>>>>
>>>>>>> Daniel introduced himself to dev@ over two years ago and has
>>>>>>> contributed in many ways since then. Daniel has contributed to general
>>>>>>> project health, the portability framework, and all three languages: 
>>>>>>> Java,
>>>>>>> Python SDK, and Go. I would like to particularly highlight how he 
>>>>>>> deleted
>>>>>>> 12k lines of dead reference runner code [1].
>>>>>>>
>>>>>>> In consideration of Daniel's contributions, the Beam PMC trusts him
>>>>>>> with the responsibilities of a Beam committer [2].
>>>>>>>
>>>>>>> Thank you, Daniel, for your contributions and looking forward to
>>>>>>> many more!
>>>>>>>
>>>>>>> Kenn, on behalf of the Apache Beam PMC
>>>>>>>
>>>>>>> [1] https://github.com/apache/beam/pull/8380
>>>>>>> [2]
>>>>>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>>>>>
>>>>>>

-- 

Ruoyun  Huang


Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-20 Thread Ruoyun Huang
[ ] Beaver
[] Hedgehog
[ ] Lemur
[x] Owl
[ ] Salmon
[ ] Trout
[ ] Robot dinosaur
[x] Firefly
[ ] Cuttlefish
[x] Dumbo Octopus
[ ] Angler fish

On Wed, Nov 20, 2019 at 10:38 AM  wrote:

> [ ] Beaver
> [x] Hedgehog
> [ ] Lemur
> [x] Owl
> [ ] Salmon
> [ ] Trout
> [ ] Robot dinosaur
> [ ] Firefly
> [x] Cuttlefish
> [x] Dumbo Octopus
> [ ] Angler fish
>
>
>
> On 2019/11/20 02:43:42, Kenneth Knowles  wrote:
> > Please cast your votes of approval [1] for animals you would support as>
> > Beam mascot. The animal with the most approval will be identified as
> the>
> > favorite.>
> >
> > *** Vote for as many as you like, using this checklist as a template
> >
> >
> > [ ] Beaver>
> > [ ] Hedgehog>
> > [ ] Lemur>
> > [ ] Owl>
> > [ ] Salmon>
> > [ ] Trout>
> > [ ] Robot dinosaur>
> > [ ] Firefly>
> > [ ] Cuttlefish>
> > [ ] Dumbo Octopus>
> > [ ] Angler fish>
> >
> > This vote will remain open for at least 72 hours.>
> >
> > Kenn>
> >
> > [1] See https://en.wikipedia.org/wiki/Approval_voting#Description and>
> > https://www.electionscience.org/library/approval-voting/>
> >
>
>
>

-- 

Ruoyun  Huang


Re: [ANNOUNCE] New committer: Brian Hulette

2019-11-15 Thread Ruoyun Huang
Congrats Brian!

On Fri, Nov 15, 2019 at 10:41 AM Robin Qiu  wrote:

> Congrats, Brian!
>
> On Fri, Nov 15, 2019 at 10:02 AM Daniel Oliveira 
> wrote:
>
>> Congratulations Brian! It's well deserved.
>>
>> On Fri, Nov 15, 2019, 9:37 AM Alexey Romanenko 
>> wrote:
>>
>>> Congratulations, Brian!
>>>
>>> On 15 Nov 2019, at 18:27, Rui Wang  wrote:
>>>
>>> Congrats!
>>>
>>>
>>> -Rui
>>>
>>> On Fri, Nov 15, 2019 at 8:16 AM Thomas Weise  wrote:
>>>
>>>> Congratulations!
>>>>
>>>>
>>>> On Fri, Nov 15, 2019 at 6:34 AM Connell O'Callaghan <
>>>> conne...@google.com> wrote:
>>>>
>>>>> Well done Brian!!!
>>>>>
>>>>> Kenn thank you for sharing
>>>>>
>>>>> On Fri, Nov 15, 2019 at 6:31 AM Cyrus Maden  wrote:
>>>>>
>>>>>> Congrats Brian!
>>>>>>
>>>>>> On Fri, Nov 15, 2019 at 5:25 AM Ismaël Mejía 
>>>>>> wrote:
>>>>>>
>>>>>>> Congratulations Brian!
>>>>>>> Happy to see this happening and eager to see more of your work!
>>>>>>>
>>>>>>> On Fri, Nov 15, 2019 at 11:02 AM Ankur Goenka 
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Congrats Brian!
>>>>>>> >
>>>>>>> > On Fri, Nov 15, 2019, 2:42 PM Jan Lukavský 
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Congrats Brian!
>>>>>>> >>
>>>>>>> >> On 11/15/19 9:58 AM, Reza Rokni wrote:
>>>>>>> >>
>>>>>>> >> Great news!
>>>>>>> >>
>>>>>>> >> On Fri, 15 Nov 2019 at 15:09, Gleb Kanterov 
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>> Congratulations!
>>>>>>> >>>
>>>>>>> >>> On Fri, Nov 15, 2019 at 5:44 AM Valentyn Tymofieiev <
>>>>>>> valen...@google.com> wrote:
>>>>>>> >>>>
>>>>>>> >>>> Congratulations, Brian!
>>>>>>> >>>>
>>>>>>> >>>> On Thu, Nov 14, 2019 at 6:25 PM jincheng sun <
>>>>>>> sunjincheng...@gmail.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> Congratulation Brian!
>>>>>>> >>>>>
>>>>>>> >>>>> Best,
>>>>>>> >>>>> Jincheng
>>>>>>> >>>>>
>>>>>>> >>>>> Kyle Weaver  于2019年11月15日周五 上午7:19写道:
>>>>>>> >>>>>>
>>>>>>> >>>>>> Thanks for your contributions and congrats Brian!
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Thu, Nov 14, 2019 at 3:14 PM Kenneth Knowles <
>>>>>>> k...@apache.org> wrote:
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Hi all,
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Please join me and the rest of the Beam PMC in welcoming a
>>>>>>> new committer: Brian Hulette
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Brian introduced himself to dev@ earlier this year and has
>>>>>>> been contributing since then. His contributions to Beam include
>>>>>>> explorations of integration with Arrow, standardizing coders, 
>>>>>>> portability
>>>>>>> for schemas, and presentations at Beam events.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> In consideration of Brian's contributions, the Beam PMC
>>>>>>> trusts him with the responsibilities of a Beam committer [1].
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Thank you, Brian, for your contributions and looking forward
>>>>>>> to many more!
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Kenn, on behalf of the Apache Beam PMC
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> [1]
>>>>>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >>
>>>>>>> >> This email may be confidential and privileged. If you received
>>>>>>> this communication by mistake, please don't forward it to anyone else,
>>>>>>> please erase all copies and attachments, and please let me know that it 
>>>>>>> has
>>>>>>> gone to the wrong person.
>>>>>>> >>
>>>>>>> >> The above terms reflect a potential business arrangement, are
>>>>>>> provided solely as a basis for further discussion, and are not intended 
>>>>>>> to
>>>>>>> be and do not constitute a legally binding obligation. No legally 
>>>>>>> binding
>>>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>>>> final form is executed in writing by all parties involved.
>>>>>>>
>>>>>>
>>>

-- 

Ruoyun  Huang


Re: Behavior of TimestampCombiner?

2019-11-12 Thread Ruoyun Huang
Reported a tracking JIRA:  https://issues.apache.org/jira/browse/BEAM-8645

On Tue, Nov 12, 2019 at 9:48 AM Ruoyun Huang  wrote:

> Thanks for confirming.
>
> Since it is unexpected behavior, I shall look into jira if it is already
> on radar, if not, will create one.
>
> On Mon, Nov 11, 2019 at 6:11 PM Robert Bradshaw 
> wrote:
>
>> The END_OF_WINDOW is indeed 9.99 (or, in Java, 9.999000), but the
>> results for LATEST and EARLIEST should be 9 and 0 respectively.
>>
>> On Mon, Nov 11, 2019 at 5:34 PM Ruoyun Huang  wrote:
>> >
>> > Hi, Folks,
>> >
>> > I am trying to understand the behavior of TimestampCombiner. I have
>> a test like this:
>> >
>> > class TimestampCombinerTest(unittest.TestCase):
>> >
>> >   def test_combiner_latest(self):
>> > """Test TimestampCombiner with LATEST."""
>> > options = PipelineOptions()
>> > options.view_as(StandardOptions).streaming = True
>> > p = TestPipeline(options=options)
>> >
>> > main_stream = (p
>> >| 'main TestStream' >> TestStream()
>> >.add_elements([window.TimestampedValue(('k', 100),
>> 0)])
>> >.add_elements([window.TimestampedValue(('k', 400),
>> 9)])
>> >.advance_watermark_to_infinity()
>> >| 'main windowInto' >> beam.WindowInto(
>> >   window.FixedWindows(10),
>> >
>>  timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)
>> >| 'Combine' >> beam.CombinePerKey(sum))
>> >
>> > class RecordFn(beam.DoFn):
>> >   def process(self,
>> >   elm=beam.DoFn.ElementParam,
>> >   ts=beam.DoFn.TimestampParam):
>> > yield (elm, ts)
>> >
>> > records = (main_stream | beam.ParDo(RecordFn()))
>> >
>> > expected_window_to_elements = {
>> > window.IntervalWindow(0, 10): [
>> > (('k', 500),  Timestamp(9)),
>> > ],
>> > }
>> >
>> > assert_that(
>> > records,
>> > equal_to_per_window(expected_window_to_elements),
>> > use_global_window=False,
>> > label='assert per window')
>> >
>> > p.run()
>> >
>> >
>> > I expect the result to be following (based on various TimestampCombiner
>> strategy):
>> > LATEST:(('k', 500), Timestamp(9)),
>> > EARLIEST:(('k', 500), Timestamp(0)),
>> > END_OF_WINDOW: (('k', 500), Timestamp(10)),
>> >
>> > The above outcome is partially confirmed by Java side test : [1]
>> >
>> >
>> > However, from beam python, the outcome is like this:
>> > LATEST:(('k', 500), Timestamp(10)),
>> > EARLIEST:(('k', 500), Timestamp(10)),
>> > END_OF_WINDOW: (('k', 500), Timestamp(9.)),
>> >
>> > What did I miss? what should be the right expected behavior? or this
>> looks like a bug?
>> >
>> > [1]:
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L390
>> >
>> > Cheers,
>> >
>>
>
>
> --
> 
> Ruoyun  Huang
>
>

-- 

Ruoyun  Huang


Re: Behavior of TimestampCombiner?

2019-11-12 Thread Ruoyun Huang
Thanks for confirming.

Since it is unexpected behavior, I shall look into jira if it is already on
radar, if not, will create one.

On Mon, Nov 11, 2019 at 6:11 PM Robert Bradshaw  wrote:

> The END_OF_WINDOW is indeed 9.99 (or, in Java, 9.999000), but the
> results for LATEST and EARLIEST should be 9 and 0 respectively.
>
> On Mon, Nov 11, 2019 at 5:34 PM Ruoyun Huang  wrote:
> >
> > Hi, Folks,
> >
> > I am trying to understand the behavior of TimestampCombiner. I have
> a test like this:
> >
> > class TimestampCombinerTest(unittest.TestCase):
> >
> >   def test_combiner_latest(self):
> > """Test TimestampCombiner with LATEST."""
> > options = PipelineOptions()
> > options.view_as(StandardOptions).streaming = True
> > p = TestPipeline(options=options)
> >
> > main_stream = (p
> >| 'main TestStream' >> TestStream()
> >.add_elements([window.TimestampedValue(('k', 100),
> 0)])
> >.add_elements([window.TimestampedValue(('k', 400),
> 9)])
> >.advance_watermark_to_infinity()
> >| 'main windowInto' >> beam.WindowInto(
> >   window.FixedWindows(10),
> >
>  timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)
> >| 'Combine' >> beam.CombinePerKey(sum))
> >
> > class RecordFn(beam.DoFn):
> >   def process(self,
> >   elm=beam.DoFn.ElementParam,
> >   ts=beam.DoFn.TimestampParam):
> > yield (elm, ts)
> >
> > records = (main_stream | beam.ParDo(RecordFn()))
> >
> > expected_window_to_elements = {
> > window.IntervalWindow(0, 10): [
> > (('k', 500),  Timestamp(9)),
> > ],
> > }
> >
> > assert_that(
> > records,
> > equal_to_per_window(expected_window_to_elements),
> > use_global_window=False,
> > label='assert per window')
> >
> > p.run()
> >
> >
> > I expect the result to be following (based on various TimestampCombiner
> strategy):
> > LATEST:(('k', 500), Timestamp(9)),
> > EARLIEST:(('k', 500), Timestamp(0)),
> > END_OF_WINDOW: (('k', 500), Timestamp(10)),
> >
> > The above outcome is partially confirmed by Java side test : [1]
> >
> >
> > However, from beam python, the outcome is like this:
> > LATEST:(('k', 500), Timestamp(10)),
> > EARLIEST:(('k', 500), Timestamp(10)),
> > END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> >
> > What did I miss? what should be the right expected behavior? or this
> looks like a bug?
> >
> > [1]:
> https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L390
> >
> > Cheers,
> >
>


-- 

Ruoyun  Huang


Behavior of TimestampCombiner?

2019-11-11 Thread Ruoyun Huang
Hi, Folks,

I am trying to understand the behavior of TimestampCombiner. I have a
test like this:


   1. class TimestampCombinerTest(unittest.TestCase):
   2.
   3.   def test_combiner_latest(self):
   4. """Test TimestampCombiner with LATEST."""
   5. options = PipelineOptions()
   6. options.view_as(StandardOptions).streaming = True
   7. p = TestPipeline(options=options)
   8.
   9. main_stream = (p
   10.| 'main TestStream' >> TestStream()
   11..add_elements([window.TimestampedValue(('k',
100), 0)])
   12..add_elements([window.TimestampedValue(('k',
400), 9)])
   13..advance_watermark_to_infinity()
   14.| 'main windowInto' >> beam.WindowInto(
   15.   window.FixedWindows(10),
   16.
timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)
   17.| 'Combine' >> beam.CombinePerKey(sum))
   18.
   19. class RecordFn(beam.DoFn):
   20.   def process(self,
   21.   elm=beam.DoFn.ElementParam,
   22.   ts=beam.DoFn.TimestampParam):
   23. yield (elm, ts)
   24.
   25. records = (main_stream | beam.ParDo(RecordFn()))
   26.
   27. expected_window_to_elements = {
   28. window.IntervalWindow(0, 10): [
   29. (('k', 500),  Timestamp(9)),
   30. ],
   31. }
   32.
   33. assert_that(
   34. records,
   35. equal_to_per_window(expected_window_to_elements),
   36. use_global_window=False,
   37. label='assert per window')
   38.
   39. p.run()


I expect the result to be following (based on various TimestampCombiner
strategy):
LATEST:(('k', 500), Timestamp(9)),
EARLIEST:(('k', 500), Timestamp(0)),
END_OF_WINDOW: (('k', 500), Timestamp(10)),
The above outcome is partially confirmed by Java side test : [1]

However, from beam python, the outcome is like this:
LATEST:(('k', 500), Timestamp(10)),
EARLIEST:(('k', 500), Timestamp(10)),
END_OF_WINDOW: (('k', 500), Timestamp(9.)),

What did I miss? what should be the right expected behavior? or this looks
like a bug?

[1]:
https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L390

Cheers,


Re: [ANNOUNCE] New committer: Alan Myrvold

2019-09-27 Thread Ruoyun Huang
Congratulations, Alan!


On Fri, Sep 27, 2019 at 9:55 AM Rui Wang  wrote:

> Congrats!
>
> -Rui
>
> On Fri, Sep 27, 2019 at 9:54 AM Pablo Estrada  wrote:
>
>> Yooh! : D
>>
>> On Fri, Sep 27, 2019 at 9:53 AM Yifan Zou  wrote:
>>
>>> Congratulations, Alan!
>>>
>>> On Fri, Sep 27, 2019 at 9:18 AM Ahmet Altay  wrote:
>>>
>>>> Hi,
>>>>
>>>> Please join me and the rest of the Beam PMC in welcoming a new
>>>> committer: Alan Myrvold
>>>>
>>>> Alan has been a long time Beam contributor. His contributions made Beam
>>>> more productive and friendlier [1] for all contributors with significant
>>>> improvements to Beam release process, automation, and infrastructure.
>>>>
>>>> In consideration of Alan's contributions, the Beam PMC trusts him
>>>> with the responsibilities of a Beam committer [2].
>>>>
>>>> Thank you, Alan, for your contributions and looking forward to many
>>>> more!
>>>>
>>>> Ahmet, on behalf of the Apache Beam PMC
>>>>
>>>> [1]
>>>> https://beam-summit-na-2019.firebaseapp.com/schedule/2019-09-11?sessionId=1126
>>>> [2] https://beam.apache.org/contribute/become-a-committer
>>>> /#an-apache-beam-committer
>>>>
>>>

-- 

Ruoyun  Huang


Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Ruoyun Huang
Congratulations Valentyn!

On Tue, Aug 27, 2019 at 6:16 PM Daniel Oliveira 
wrote:

> Congratulations Valentyn!
>
> On Tue, Aug 27, 2019, 11:31 AM Boyuan Zhang  wrote:
>
>> Congratulations!
>>
>> On Tue, Aug 27, 2019 at 10:44 AM Udi Meiri  wrote:
>>
>>> Congrats!
>>>
>>> On Tue, Aug 27, 2019 at 9:50 AM Yichi Zhang  wrote:
>>>
>>>> Congrats Valentyn!
>>>>
>>>> On Tue, Aug 27, 2019 at 7:55 AM Valentyn Tymofieiev <
>>>> valen...@google.com> wrote:
>>>>
>>>>> Thank you everyone!
>>>>>
>>>>> On Tue, Aug 27, 2019 at 2:57 AM Alexey Romanenko <
>>>>> aromanenko@gmail.com> wrote:
>>>>>
>>>>>> Congrats, well deserved!
>>>>>>
>>>>>> On 27 Aug 2019, at 11:25, Jan Lukavský  wrote:
>>>>>>
>>>>>> Congrats Valentyn!
>>>>>> On 8/26/19 11:43 PM, Rui Wang wrote:
>>>>>>
>>>>>> Congratulations!
>>>>>>
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>> On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang 
>>>>>> wrote:
>>>>>>
>>>>>>> Congratulations Valentyn, well deserved!
>>>>>>>
>>>>>>> On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath <
>>>>>>> chamik...@google.com> wrote:
>>>>>>>
>>>>>>>> Congrats Valentyn!
>>>>>>>>
>>>>>>>> On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Valentyn!
>>>>>>>>>
>>>>>>>>> On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thank you Valentyn! Congratulations!
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw <
>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Please join me and the rest of the Beam PMC in welcoming a new
>>>>>>>>>>> committer: Valentyn Tymofieiev
>>>>>>>>>>>
>>>>>>>>>>> Valentyn has made numerous contributions to Beam over the last
>>>>>>>>>>> several
>>>>>>>>>>> years (including 100+ pull requests), most recently pushing
>>>>>>>>>>> through
>>>>>>>>>>> the effort to make Beam compatible with Python 3. He is also an
>>>>>>>>>>> active
>>>>>>>>>>> participant in design discussions on the list, participates in
>>>>>>>>>>> release
>>>>>>>>>>> candidate validation, and proactively helps keep our tests green.
>>>>>>>>>>>
>>>>>>>>>>> In consideration of Valentyn's contributions, the Beam PMC
>>>>>>>>>>> trusts him
>>>>>>>>>>> with the responsibilities of a Beam committer [1].
>>>>>>>>>>>
>>>>>>>>>>> Thank you, Valentyn, for your contributions and looking forward
>>>>>>>>>>> to many more!
>>>>>>>>>>>
>>>>>>>>>>> Robert, on behalf of the Apache Beam PMC
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>

-- 

Ruoyun  Huang


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Ruoyun Huang
Congratulations !

On Tue, Jul 16, 2019 at 10:24 AM Ahmet Altay  wrote:

> Hi,
>
> Please join me and the rest of the Beam PMC in welcoming a new committer: 
> Robert
> Burke.
>
> Robert has been contributing to Beam and actively involved in the
> community for over a year. He has been actively working on Go SDK, helping
> users, and making it easier for others to contribute [1].
>
> In consideration of Robert's contributions, the Beam PMC trusts him with
> the responsibilities of a Beam committer [2].
>
> Thank you, Robert, for your contributions and looking forward to many more!
>
> Ahmet, on behalf of the Apache Beam PMC
>
> [1]
> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-
> committer
>


-- 

Ruoyun  Huang


Re: [ANNOUNCE] New committer: Mikhail Gryzykhin

2019-06-21 Thread Ruoyun Huang
Congratulations! Mikhail!


On Fri, Jun 21, 2019 at 1:00 PM Yichi Zhang  wrote:

> Congrats!
>
> On Fri, Jun 21, 2019 at 11:55 AM Tanay Tummalapalli 
> wrote:
>
>> Congratulations!
>>
>> On Fri, Jun 21, 2019 at 10:35 PM Rui Wang  wrote:
>>
>>> Congrats!
>>>
>>>
>>> -Rui
>>>
>>> On Fri, Jun 21, 2019 at 9:58 AM Robin Qiu  wrote:
>>>
>>>> Congrats, Mikhail!
>>>>
>>>> On Fri, Jun 21, 2019 at 9:12 AM Alexey Romanenko <
>>>> aromanenko@gmail.com> wrote:
>>>>
>>>>> Congrats, Mikhail!
>>>>>
>>>>> On 21 Jun 2019, at 18:01, Anton Kedin  wrote:
>>>>>
>>>>> Congrats!
>>>>>
>>>>> On Fri, Jun 21, 2019 at 3:55 AM Reza Rokni  wrote:
>>>>>
>>>>>> Congratulations!
>>>>>>
>>>>>> On Fri, 21 Jun 2019, 12:37 Robert Burke,  wrote:
>>>>>>
>>>>>>> Congrats
>>>>>>>
>>>>>>> On Fri, Jun 21, 2019, 12:29 PM Thomas Weise  wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Please join me and the rest of the Beam PMC in welcoming a new
>>>>>>>> committer: Mikhail Gryzykhin.
>>>>>>>>
>>>>>>>> Mikhail has been contributing to Beam and actively involved in the
>>>>>>>> community for over a year. He developed the community build dashboard 
>>>>>>>> [1]
>>>>>>>> and added substantial improvements to our build infrastructure. 
>>>>>>>> Mikhail's
>>>>>>>> work also covers metrics, contributor documentation, development 
>>>>>>>> process
>>>>>>>> improvements and other areas.
>>>>>>>>
>>>>>>>> In consideration of Mikhail's contributions, the Beam PMC trusts
>>>>>>>> him with the responsibilities of a Beam committer [2].
>>>>>>>>
>>>>>>>> Thank you, Mikhail, for your contributions and looking forward to
>>>>>>>> many more!
>>>>>>>>
>>>>>>>> Thomas, on behalf of the Apache Beam PMC
>>>>>>>>
>>>>>>>> [1] https://s.apache.org/beam-community-metrics
>>>>>>>> [2]
>>>>>>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>>>>>>
>>>>>>>>
>>>>>

-- 

Ruoyun  Huang


Re: 1 Million Lines of Code (1 MLOC)

2019-06-03 Thread Ruoyun Huang
1  206
> >   21   16  169
> >   C++  2   72
> > 4   36   32
> >   Autoconf 1   21
> > 1   164
> >
>  
> 
> >   Total 5002  1000874
> >   132497   173987   694390
> >
>  
> 
> >
> > [1] https://github.com/cgag/loc
> >
>


-- 

Ruoyun  Huang


[Discussion] A tweak to existing large iterable protocol?

2019-05-20 Thread Ruoyun Huang
Hi, Folks,

We propose to make a tweak to existing fnapi Large Iterable (result from
GBK) protocol. Would like to see what everyone thinks.

*To clarify a few terms used:*

*[large iterable]* A list of elements that are too expensive to hold them
all in memory;  To store a single element is relatively cheap.

*[Large item iterable]* Same as above, except for even one single element
is too expensive to keep in memory.

[*Page*] A subset of elements from an iterable, up to a size limit.

[*Subsequent pages*] All pages except for the first one.

*Status quo: *Currently large iterable is implemented by using a lazy
decoding mechanism (related community discussions  [1]), and is now only
implemented/available in python. To summarize how it works: The first page
is transmitted over data channel, and all the subsequent pages are
transmitted over state channel. For each subsequent page, we store on
Runner side: a token, and a mapping from token to corresponding iterator.
In the meanwhile, SDK side always holds the first page in memory [until
completely done].


There is no token created/sent for starting position of iterable.

*Proposed Tweak:* Create a token for starting position of EACH iterable,
and send it over to SDK. If SDK needs to re-iterate the large iterable, SDK
may request all the pages (including the first page) via state channel.

*Benefit:* SDK side no longer holds first page in memory. Thus better
memory efficiency. In addition to that, it makes easier to handle
large-item-iterable in the future.

*Cons:* [Any other downside?]

More data communication (one extra page) is needed.

*Collateral impact:* None. The only difference this proposal makes, is to
add *one* token (i.e. starting position of large-iterable) into existing
protocol. This token can be just redundant, with the existing behavior
remains unchanged. Once this information is in place, SDKs can make their
own choices whether to store first page in memory or not.

*Why now?*  Large iterable is available only in Python, but it is coming to
other SDKs soon. It would probably be easier to add this extra information
early, thus less modification needed everywhere later on.

[1]:
https://lists.apache.org/thread.html/70cac361b659516933c505b513d43986c25c13da59eabfd28457f1f2@%3Cdev.beam.apache.org%3E


Re: [VOTE] Remove deprecated Java Reference Runner code from repository.

2019-05-14 Thread Ruoyun Huang
+1

*From: *Daniel Oliveira 
*Date: *Tue, May 14, 2019 at 2:19 PM
*To: *dev

Hello everyone,
>
> I'm calling for a vote on removing the deprecated Java Reference Runner
> code. The PR for the change has already been tested and reviewed:
> https://github.com/apache/beam/pull/8380
>
> [ ] +1, Approve merging the removal PR in it's current state
> [ ] -1, Veto the removal PR (please provide specific comments)
>
> The vote will be open for at least 72 hours. Since this a vote on
> code-modification, it is adopted if there are at least 3 PMC affirmative
> votes and no vetoes.
>
> For those who would like context on why the Java Reference Runner is being
> deprecated, the discussions took place in the following email threads:
>
>1. (8 Feb. 2019) Thoughts on a reference runner to invest in?
>
> <https://lists.apache.org/thread.html/b235f8ee55a737ea399756edd80b1218ed34d3439f7b0ed59bfa8e40@%3Cdev.beam.apache.org%3E>
>  -
>Decision to deprecate the Java Reference Runner and use the Python
>FnApiRunner for those use cases instead.
>2. (14 Mar. 2019) Python PVR Reference post-commit tests failing
>
> <https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E>
>- Removal of Reference Runner Post-Commits from Jenkins, and discussion on
>removal of code.
>3. (25 Apr. 2019) Removing Java Reference Runner code
>
> <https://lists.apache.org/thread.html/6a347d072a7f2c7392550f0315545826320c52e51b9a497c51000ef9@%3Cdev.beam.apache.org%3E>
>- Discussion thread before this formal vote.
>
>

-- 

Ruoyun  Huang


Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Ruoyun Huang
Congratulations Udi!

On Fri, May 3, 2019 at 2:30 PM Ahmet Altay  wrote:

> Congratulations, Udi!
>
> *From: *Kyle Weaver 
> *Date: *Fri, May 3, 2019 at 2:11 PM
> *To: * 
>
> Congratulations Udi! I look forward to sending you all my reviews for
>> the next month (just kidding :)
>>
>> Kyle Weaver | Software Engineer | github.com/ibzib |
>> kcwea...@google.com | +1650203
>>
>> On Fri, May 3, 2019 at 1:52 PM Charles Chen  wrote:
>> >
>> > Thank you Udi!
>> >
>> > On Fri, May 3, 2019, 1:51 PM Aizhamal Nurmamat kyzy <
>> aizha...@google.com> wrote:
>> >>
>> >> Congratulations, Udi! Thank you for all your contributions!!!
>> >>
>> >> From: Pablo Estrada 
>> >> Date: Fri, May 3, 2019 at 1:45 PM
>> >> To: dev
>> >>
>> >>> Thanks Udi and congrats!
>> >>>
>> >>> On Fri, May 3, 2019 at 1:44 PM Kenneth Knowles 
>> wrote:
>> >>>>
>> >>>> Hi all,
>> >>>>
>> >>>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Udi Meiri.
>> >>>>
>> >>>> Udi has been contributing to Beam since late 2017, starting with
>> HDFS support in the Python SDK and continuing with a ton of Python work. I
>> also will highlight his work on community-building infrastructure,
>> including documentation, experiments with ways to find reviewers for pull
>> requests, gradle build work, analyzing and reducing build times.
>> >>>>
>> >>>> In consideration of Udi's contributions, the Beam PMC trusts Udi
>> with the responsibilities of a Beam committer [1].
>> >>>>
>> >>>> Thank you, Udi, for your contributions.
>> >>>>
>> >>>> Kenn
>> >>>>
>> >>>> [1]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>

-- 

Ruoyun  Huang


Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-01 Thread Ruoyun Huang
Congratulations everyone!  Well deserved!

On Wed, May 1, 2019 at 8:38 PM Kenneth Knowles  wrote:

> Congrats! All well deserved!
>
> Kenn
>
> On Wed, May 1, 2019 at 8:09 PM Reza Rokni  wrote:
>
>> Congratulations!
>>
>> On Thu, 2 May 2019 at 10:53, Connell O'Callaghan 
>> wrote:
>>
>>> Well done - congratulations to you all!!! Rose thank you for sharing
>>> this news!!!
>>>
>>> On Wed, May 1, 2019 at 19:45 Rose Nguyen  wrote:
>>>
>>>> Matthias Baetens, Lukazs Gajowy, Suneel Marthi, Maximilian Michels,
>>>> Alex Van Boxel, and Thomas Weise:
>>>>
>>>> Thank you for your exceptional contributions to Apache Beam. I'm
>>>> looking forward to seeing this project grow and for more folks to
>>>> contribute and be recognized! Everyone can read more about this award on
>>>> the Google Open Source blog:
>>>> https://opensource.googleblog.com/2019/04/google-open-source-peer-bonus-winners.html
>>>>
>>>> Cheers,
>>>> --
>>>> Rose Thị Nguyễn
>>>>
>>>
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>

-- 

Ruoyun  Huang


Re: [ANNOUNCE] New committer announcement: Yifan Zou

2019-04-22 Thread Ruoyun Huang
Congratulations, Yifan!

On Mon, Apr 22, 2019 at 9:48 AM Boyuan Zhang  wrote:

> Congratulations, Yifan~
>
> On Mon, Apr 22, 2019 at 9:29 AM Connell O'Callaghan 
> wrote:
>
>> Well done Yifan!!!
>>
>> Thank you for sharing Kenn!!!
>>
>> On Mon, Apr 22, 2019 at 9:00 AM Ahmet Altay  wrote:
>>
>>> Congratulations, Yifan!
>>>
>>> On Mon, Apr 22, 2019 at 8:46 AM Tim Robertson 
>>> wrote:
>>>
>>>> Congratulations Yifan!
>>>>
>>>> On Mon, Apr 22, 2019 at 5:39 PM Cyrus Maden  wrote:
>>>>
>>>>> Congratulations Yifan!!
>>>>>
>>>>> On Mon, Apr 22, 2019 at 11:26 AM Kenneth Knowles 
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Please join me and the rest of the Beam PMC in welcoming a new
>>>>>> committer: Yifan Zou.
>>>>>>
>>>>>> Yifan has been contributing to Beam since early 2018. He has
>>>>>> proposed 70+ pull requests, adding dependency checking and improving test
>>>>>> infrastructure. But something the numbers cannot show adequately is the
>>>>>> huge effort Yifan has put into working with infra and keeping our Jenkins
>>>>>> executors healthy.
>>>>>>
>>>>>> In consideration of Yian's contributions, the Beam PMC trusts Yifan
>>>>>> with the responsibilities of a Beam committer [1].
>>>>>>
>>>>>> Thank you, Yifan, for your contributions.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> [1] https://beam.apache.org/contribute/become-a-committer/#an-apache-
>>>>>> beam-committer
>>>>>>
>>>>>

-- 

Ruoyun  Huang


Re: [DISCUSS] Side input consistency guarantees for triggers with multiple firings

2019-04-11 Thread Ruoyun Huang
With little to none experience on Trigger, I am trying to understand the
problem statement in this discussion.

If a user is aware of the potential non-deterministic behavior, isn't it
almost trivial to refactor his/her user code, by putting PCollectionViews S
and T into one single PCollectionView S', to get around the issue? I
cannot think of a reason (wrong?) why a user *have* to put data into two
separate PCollectionViews in a single ParDo(A).

On Thu, Apr 11, 2019 at 10:16 AM Lukasz Cwik  wrote:

> Even though what Kenn points out is a major reason for me bringing up this
> topic, I didn't want to limit this discussion to how side inputs could work
> but in general what users want from their side inputs when dealing with
> multiple firings.
>
> On Thu, Apr 11, 2019 at 10:09 AM Kenneth Knowles  wrote:
>
>> Luke & I talked in person a bit. I want to give context for what is at
>> stake here, in terms of side inputs in portability. A decent starting place
>> is https://s.apache.org/beam-side-inputs-1-pager
>>
>> In that general design, the runner offers the SDK just one (or a few)
>> materialization strategies, and the SDK builds idiomatic structures on top
>> of it. Concretely, the Fn API today offers a multimap structure, and the
>> idea was that the SDK could cleverly prepare a PCollection> for the
>> runner to materialize. As a naive example, a simple iterable structure
>> could just map all elements to one dummy key in the multimap. But if you
>> wanted a list plus its length, then you might map all elements to an
>> element key and the length to a special length meta-key.
>>
>> So there is a problem: if the SDK is outputting a new KV<"elements-key",
>> ...> and KV<"length-key", ...> for the runner to materialize then consumers
>> of the side input need to see both updates to the materialization or
>> neither. In general, these outputs might span many keys.
>>
>> It seems like there are a few ways to resolve this tension:
>>
>>  - Establish a consistency model so these updates will be observed
>> together. Seems hard and whatever we come up with will limit runners, limit
>> efficiency, and potentially leak into users having to reason about
>> concurrency
>>
>>  - Instead of building the variety of side input views on one primitive
>> multimap materialization, force runners to provide many primitive
>> materializations with consistency under the hood. Not hard to get started,
>> but adds an unfortunate new dimension for runners to vary in functionality
>> and performance, versus letting them optimize just one or a few
>> materializations
>>
>>  - Have no consistency and just not support side input methods that would
>> require consistent metadata. I'm curious what features this will hurt.
>>
>>  - Have no consistency but require the SDK to build some sort of large
>> value since single-element consistency is built in to the model always.
>> Today many runners do concatenate all elements into one value, though that
>> does not perform well. Making this effective probably requires new model
>> features.
>>
>> Kenn
>>
>> On Thu, Apr 11, 2019 at 9:44 AM Reuven Lax  wrote:
>>
>>> One thing to keep in mind: triggers that fire multiple times per window
>>> already tend to be non deterministic. These are element-count or
>>> processing-time triggers, both of which are fairly non deterministic in
>>> firing.
>>>
>>> Reuven
>>>
>>> On Thu, Apr 11, 2019 at 9:27 AM Lukasz Cwik  wrote:
>>>
>>>> Today, we define that a side input becomes available to be consumed
>>>> once at least one firing occurs or when the runner detects that no such
>>>> output could be produced (e.g. watermark is beyond the end of the window
>>>> when using the default trigger). For triggers that fire at most once,
>>>> consumers are guaranteed to have a consistent view of the contents of the
>>>> side input. But what happens when the trigger fire multiple times?
>>>>
>>>> Lets say we have a pipeline containing:
>>>> ParDo(A) --> PCollectionView S
>>>>  \-> PCollectionView T
>>>>
>>>>   ...
>>>>|
>>>> ParDo(C) <-(side input)- PCollectionView S and PCollectionView T
>>>>|
>>>>   ...
>>>>
>>>> 1) Lets say ParDo(A) outputs (during a single bundle) X and Y to
>>>> PCollectionView S, should ParDo(C) see be guaranteed to see X only if it
>>>> can also see Y (and vice versa)?
>>>>
>>>> 2) Lets say ParDo(A) outputs (during a single bundle) X to
>>>> PCollectionView S and Y to PCollectionView T, should ParDo(C) be guaranteed
>>>> to see X only if it can also see Y?
>>>>
>>>

-- 

Ruoyun  Huang


Re: [ANNOUNCE] New committer announcement: Boyuan Zhang

2019-04-10 Thread Ruoyun Huang
Thanks for your contributions and congratulations Boyuan!

On Wed, Apr 10, 2019 at 9:00 AM Kenneth Knowles  wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new committer:
> Boyuan Zhang.
>
> Boyuan has been contributing to Beam since early 2018. She has proposed
> 100+ pull requests across a wide range of topics: bug fixes, to integration
> tests, build improvements, metrics features, release automation. Two big
> picture things to highlight are building/releasing Beam Python wheels and
> managing the donation of the Beam Dataflow Java Worker, including help with
> I.P. clearance.
>
> In consideration of Boyuan's contributions, the Beam PMC trusts Boyuan
> with the responsibilities of a Beam committer [1].
>
> Thank you, Boyuan, for your contributions.
>
> Kenn
>
> [1] https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-
> committer
>


-- 

Ruoyun  Huang


Re: New contributor

2019-03-29 Thread Ruoyun Huang
Welcome Niklas!

On Fri, Mar 29, 2019 at 9:23 AM Ismaël Mejía  wrote:

> Welcome Niklas!
>
> On Fri, Mar 29, 2019 at 3:54 PM Guobao Li  wrote:
> >
> > Welcome!
> >
> > On Wed, Mar 27, 2019 at 11:12 PM Kenneth Knowles  wrote:
> >>
> >> Welcome!
> >>
> >> On Wed, Mar 27, 2019 at 2:59 PM Mikhail Gryzykhin 
> wrote:
> >>>
> >>> Welcome Niklas.
> >>>
> >>> This is another location with useful resources for contributors:
> https://cwiki.apache.org/confluence/display/BEAM/Developer+Guides
> (contributor guide has link to this as well though)
> >>>
> >>> On Wed, Mar 27, 2019 at 10:54 AM Connell O'Callaghan <
> conne...@google.com> wrote:
> >>>>
> >>>> Welcome Niklas - given your background it will be very interesting to
> see your contributions.
> >>>>
> >>>> On Wed, Mar 27, 2019 at 10:29 AM Mark Liu  wrote:
> >>>>>
> >>>>> Welcome!
> >>>>>
> >>>>> Mark
> >>>>>
> >>>>> On Wed, Mar 27, 2019 at 10:09 AM Lukasz Cwik 
> wrote:
> >>>>>>
> >>>>>> Welcome. The getting started[1] and contribution guides[2] are most
> useful. I have also added you as a contributor to the JIRA project.
> >>>>>>
> >>>>>> 1: https://beam.apache.org/get-started/beam-overview/
> >>>>>> 2: https://beam.apache.org/contribute/
> >>>>>>
> >>>>>> On Wed, Mar 27, 2019 at 9:38 AM Niklas Hansson <
> niklas.sven.hans...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi!
> >>>>>>>
> >>>>>>> I work as a data scientist within banking but will switch over to
> manufacturing the next month. I would like to contribute to Beam and
> especially the Python SDK. Could you add me as a contributor?
> >>>>>>>
> >>>>>>> I am new to open source contribution so feel free to give me any
> advice or point me in the right direction. Plan to start off with some of
> the starter tasks from the Jira board.
> >>>>>>>
> >>>>>>> Best regards
> >>>>>>> Niklas
>


-- 

Ruoyun  Huang


Re: [ANNOUNCE] New committer announcement: Mark Liu

2019-03-25 Thread Ruoyun Huang
Congratulations Mark!

On Mon, Mar 25, 2019 at 9:31 AM Udi Meiri  wrote:

> Congrats Mark!
>
> On Mon, Mar 25, 2019 at 9:24 AM Ahmet Altay  wrote:
>
>> Congratulations, Mark! 
>>
>> On Mon, Mar 25, 2019 at 7:24 AM Tim Robertson 
>> wrote:
>>
>>> Congratulations Mark!
>>>
>>>
>>> On Mon, Mar 25, 2019 at 3:18 PM Michael Luckey 
>>> wrote:
>>>
>>>> Nice! Congratulations, Mark.
>>>>
>>>> On Mon, Mar 25, 2019 at 2:42 PM Katarzyna Kucharczyk <
>>>> ka.kucharc...@gmail.com> wrote:
>>>>
>>>>> Congratulations, Mark! 
>>>>>
>>>>> On Mon, Mar 25, 2019 at 11:24 AM Gleb Kanterov 
>>>>> wrote:
>>>>>
>>>>>> Congratulations!
>>>>>>
>>>>>> On Mon, Mar 25, 2019 at 10:23 AM Łukasz Gajowy 
>>>>>> wrote:
>>>>>>
>>>>>>> Congrats! :)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> pon., 25 mar 2019 o 08:11 Aizhamal Nurmamat kyzy <
>>>>>>> aizha...@google.com> napisał(a):
>>>>>>>
>>>>>>>> Congratulations, Mark!
>>>>>>>>
>>>>>>>> On Sun, Mar 24, 2019 at 23:18 Pablo Estrada 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Yeaah  Mark! : ) Congrats : D
>>>>>>>>>
>>>>>>>>> On Sun, Mar 24, 2019 at 10:32 PM Yifan Zou 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Congratulations Mark!
>>>>>>>>>>
>>>>>>>>>> On Sun, Mar 24, 2019 at 10:25 PM Connell O'Callaghan <
>>>>>>>>>> conne...@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Well done congratulations Mark!!!
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Mar 24, 2019 at 10:17 PM Robert Burke <
>>>>>>>>>>> rob...@frantil.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Congratulations Mark! 
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Mar 24, 2019, 10:08 PM Valentyn Tymofieiev <
>>>>>>>>>>>> valen...@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Congratulations, Mark!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your contributions, in particular for your efforts
>>>>>>>>>>>>> to parallelize test execution for Python SDK and increase the 
>>>>>>>>>>>>> speed of
>>>>>>>>>>>>> Python precommit checks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Mar 24, 2019 at 9:40 PM Kenneth Knowles <
>>>>>>>>>>>>> k...@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please join me and the rest of the Beam PMC in welcoming a
>>>>>>>>>>>>>> new committer: Mark Liu.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mark has been contributing to Beam since late 2016! He has
>>>>>>>>>>>>>> proposed 100+ pull requests. Mark was instrumental in expanding 
>>>>>>>>>>>>>> test and
>>>>>>>>>>>>>> infrastructure coverage, especially for Python. In
>>>>>>>>>>>>>> consideration of Mark's contributions, the Beam PMC trusts Mark 
>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>> responsibilities of a Beam committer [1].
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you, Mark, for your contributions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kenn
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] https://beam.apache.org/contribute/become-a-committer/
>>>>>>>>>>>>>> #an-apache-beam-committer
>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>
>>>>>>>> *Aizhamal Nurmamat kyzy*
>>>>>>>>
>>>>>>>> Open Source Program Manager
>>>>>>>>
>>>>>>>> 646-355-9740 Mobile
>>>>>>>>
>>>>>>>> 601 North 34th Street, Seattle, WA 98103
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Cheers,
>>>>>> Gleb
>>>>>>
>>>>>

-- 

Ruoyun  Huang


Re: "Contributors" Wiki Page

2019-03-18 Thread Ruoyun Huang
Sounds something helpful for new starters, though trying to understand what
is proposed to be exactly listed on this page.

What extra information one can get from this page, comparing to looking at
"History" or "Blame" page from github?



On Mon, Mar 18, 2019 at 8:45 AM Maximilian Michels  wrote:

> Hi,
>
> This is a follow-up from a past thread. We often get questions like "Who
> is working on component XY?" or "Whom can I ping for a review/ask a
> question?" Providing insight into the project structure is important for
> new contributors to get started.
>
> What do you think about creating a Wiki page with Beam contributors?
> Contributors would be free to leave their name, contact information, and
> a description of their work in Beam. Note that this is should be for
> everybody, not only committers/PMC members. The page could be organized
> by Beam components.
>
> Let me know what you think.
>
> Cheers,
> Max
>


-- 

Ruoyun  Huang


Re: New Contributor

2019-03-05 Thread Ruoyun Huang
Welcome Boris!

On Tue, Mar 5, 2019 at 1:34 PM Ahmet Altay  wrote:

> Welcome Boris!
>
> On Mon, Mar 4, 2019 at 5:40 PM Ismaël Mejía  wrote:
>
>> Done, welcome!
>>
>> On Tue, Mar 5, 2019 at 1:25 AM Boris Shkolnik  wrote:
>> >
>> >
>> > Hi,
>> >
>> > My name is Boris Shkolnik. I am a committer in Hadoop and Samza Apache
>> projects.
>> > I would like to contribute to beam.
>> > Could you please add me to the beam project.
>> >
>> > My user name is boryas @apache.org
>> >
>> > Thanks,
>> > -Boris.
>>
>

-- 

Ruoyun  Huang


Re: beam9 bad worker

2019-03-04 Thread Ruoyun Huang
Thanks a lot folks. Looking forward to the fix.

When we use text to trigger jenkins check job, is there are way to specify
a server?

On Mon, Mar 4, 2019 at 2:06 PM Yifan Zou  wrote:

> I am looking into the error and will disconnect the beam9 to stop breaking
> tests.
>
> On Mon, Mar 4, 2019 at 2:00 PM Pablo Estrada  wrote:
>
>> I've talked with Yifan, and I believe he's looking into it. : )
>> Best
>> -P.
>>
>> On Mon, Mar 4, 2019 at 1:55 PM Ankur Goenka  wrote:
>>
>>> Beam9 is failing all the scheduled jobs. Can we reboot the machine?
>>>
>>

-- 

Ruoyun  Huang


Re: Website tests strangely broken

2019-03-01 Thread Ruoyun Huang
The log says running on 880 external links, but only showing 100ish (and
the number varies across different run) failure messages.  Likely it is
just flakey due to connection not stable on JIRA's site?

Not sure how our tests are organized, but maybe re-try http request helps?

On Fri, Mar 1, 2019 at 12:46 PM Pablo Estrada  wrote:

> Hello all,
> the website tests are broken. I've filed BEAM-6760
> <https://issues.apache.org/jira/browse/BEAM-6760> to track fixing them,
> but I wanted to see if anyone has any idea about why it may be failing.
>
> It's been broken for a few days:
> https://builds.apache.org/job/beam_PreCommit_Website_Cron/
>
> And looking at the failures, it seems that they represent broken links:
> https://builds.apache.org/job/beam_PreCommit_Website_Cron/725/console
>
> But looking at each of the links opens their website without problems.
>
> It may be some environmental temporary issue, but why would it fail
> consistently for the last few days then?
> Thoughts?
> Thanks
> -P.
>


-- 

Ruoyun  Huang


Re: What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Ruoyun Huang
nvm.  Don't take my previous non-scientific comparison (only ran it once)
too seriously. :-)

I tried to repeat each for multiple times and now the difference
diminishes.  likely there was a transient error in caching.

On Mon, Feb 25, 2019 at 3:38 PM Kenneth Knowles  wrote:

> Ah, that is likely caused by us having ill-defined tasks that cannot be
> cached. Or is it that the configuration time is so significant?
>
> Kenn
>
> On Mon, Feb 25, 2019 at 11:05 AM Ruoyun Huang  wrote:
>
>> Out of curiosity as a light gradle user, I did a side by side comparison,
>> and the readings confirm what Ken and Michael suggests.
>>
>> In the same repository, do gradle clean then followed by either of the
>> two commands. Measure their runtime respectively.  The latter one takes
>> *1/3* running time.
>>
>> time ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
>> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
>> compileTestJava && ./gradlew compileJava
>> real 9m29.330s user 0m11.330s sys 0m1.239s
>>
>> time ./gradlew spotlessApply checkstyleMain checkstyleTest javadoc
>> findbugsMain compileJava compileTestJava
>> real3m35.573s
>> user0m2.701s
>> sys 0m0.327s
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Feb 25, 2019 at 10:47 AM Alex Amato  wrote:
>>
>>> @Michael, no particular reason. I think Ken's suggestion makes more
>>> sense.
>>>
>>> On Mon, Feb 25, 2019 at 10:36 AM Udi Meiri  wrote:
>>>
>>>> Talking about Python:
>>>> I only know of "./gradlew lint", which include style and some py3
>>>> compliance checking.
>>>> There is no auto-fix like spotlessApply AFAIK.
>>>>
>>>> As a side-note, I really dislike our python line continuation indent
>>>> rule, since pycharm can't be configured to adhere to it and I find myself
>>>> manually adjusting whitespace all the time.
>>>>
>>>>
>>>> On Mon, Feb 25, 2019 at 10:22 AM Kenneth Knowles 
>>>> wrote:
>>>>
>>>>> FWIW gradle is a depgraph-based build system. You can gain a few
>>>>> seconds by putting all but spotlessApply in one command.
>>>>>
>>>>> ./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest
>>>>> javadoc findbugsMain compileTestJava compileJava
>>>>>
>>>>> It might be clever to define a meta-task. Gradle "base plugin" has the
>>>>> notable check (build and run tests), assemble (make artifacts), and build
>>>>> (assemble + check, badly named!)
>>>>>
>>>>> I think something like "everything except running tests and building
>>>>> artifacts" might be helpful.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Mon, Feb 25, 2019 at 10:13 AM Alex Amato 
>>>>> wrote:
>>>>>
>>>>>> I made a thread about this a while back for java, but I don't think
>>>>>> the same commands like sptoless work for python.
>>>>>>
>>>>>> auto fixing lint issues
>>>>>> running and quick checks which would fail the PR (without running the
>>>>>> whole precommit?)
>>>>>> Something like findbugs to detect common issues (i.e. py3 compliance)
>>>>>>
>>>>>> FWIW, this is what I have been using for java. It will catch pretty
>>>>>> much everything except presubmit test failures.
>>>>>>
>>>>>> ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
>>>>>> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && 
>>>>>> ./gradlew
>>>>>> compileTestJava && ./gradlew compileJava
>>>>>>
>>>>>
>>
>> --
>> 
>> Ruoyun  Huang
>>
>>

-- 

Ruoyun  Huang


Re: What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Ruoyun Huang
Out of curiosity as a light gradle user, I did a side by side comparison,
and the readings confirm what Ken and Michael suggests.

In the same repository, do gradle clean then followed by either of the two
commands. Measure their runtime respectively.  The latter one takes *1/3*
running time.

time ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
compileTestJava && ./gradlew compileJava
real 9m29.330s user 0m11.330s sys 0m1.239s

time ./gradlew spotlessApply checkstyleMain checkstyleTest javadoc
findbugsMain compileJava compileTestJava
real3m35.573s
user0m2.701s
sys 0m0.327s







On Mon, Feb 25, 2019 at 10:47 AM Alex Amato  wrote:

> @Michael, no particular reason. I think Ken's suggestion makes more sense.
>
> On Mon, Feb 25, 2019 at 10:36 AM Udi Meiri  wrote:
>
>> Talking about Python:
>> I only know of "./gradlew lint", which include style and some py3
>> compliance checking.
>> There is no auto-fix like spotlessApply AFAIK.
>>
>> As a side-note, I really dislike our python line continuation indent
>> rule, since pycharm can't be configured to adhere to it and I find myself
>> manually adjusting whitespace all the time.
>>
>>
>> On Mon, Feb 25, 2019 at 10:22 AM Kenneth Knowles  wrote:
>>
>>> FWIW gradle is a depgraph-based build system. You can gain a few seconds
>>> by putting all but spotlessApply in one command.
>>>
>>> ./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest
>>> javadoc findbugsMain compileTestJava compileJava
>>>
>>> It might be clever to define a meta-task. Gradle "base plugin" has the
>>> notable check (build and run tests), assemble (make artifacts), and build
>>> (assemble + check, badly named!)
>>>
>>> I think something like "everything except running tests and building
>>> artifacts" might be helpful.
>>>
>>> Kenn
>>>
>>> On Mon, Feb 25, 2019 at 10:13 AM Alex Amato  wrote:
>>>
>>>> I made a thread about this a while back for java, but I don't think the
>>>> same commands like sptoless work for python.
>>>>
>>>> auto fixing lint issues
>>>> running and quick checks which would fail the PR (without running the
>>>> whole precommit?)
>>>> Something like findbugs to detect common issues (i.e. py3 compliance)
>>>>
>>>> FWIW, this is what I have been using for java. It will catch pretty
>>>> much everything except presubmit test failures.
>>>>
>>>> ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
>>>> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
>>>> compileTestJava && ./gradlew compileJava
>>>>
>>>

-- 

Ruoyun  Huang


Re: Signing off

2019-02-18 Thread Ruoyun Huang
We will miss you, Scott! Good luck and have fun with your new project!

On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner  wrote:

> I wanted to let you all know that I've decided to pursue a new adventure
> in my career, which will take me away from Apache Beam development.
>
> It's been a fun and fulfilling journey. Apache Beam has been my first
> significant experience working in open source. I'm inspired observing how
> the community has come together to deliver something great.
>
> Thanks for everything. If you're curious what's next: I'll be working on
> Federated Learning at Google:
> https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
>
> Take care,
> Scott
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>


-- 

Ruoyun  Huang


Re: Enforce javadoc comments in public methods?

2019-01-28 Thread Ruoyun Huang
Fair point. Looking at JavaDocMethod spec [1], unfortunately there is no
properties available for this tweak.

Let me dig a bit more to see whether this can be done via suppression.

[1] http://checkstyle.sourceforge.net/config_javadoc.html#JavadocMethod

On Mon, Jan 28, 2019 at 4:36 PM Reuven Lax  wrote:

> This appears to be forcing us to set javadoc on constructors as well,
> which is usually pointless. Can we exclude constructor methods from this
> check?
>
> On Wed, Jan 23, 2019 at 5:40 PM Ruoyun Huang  wrote:
>
>> Our recent change is on "JavaDocMethod", which not turned on yet. Not
>> relevant to this error here.
>>
>> The one throws error is "javaDocType". it has been there for a while
>> <https://github.com/apache/beam/blame/master/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml#L156>,
>> which is for public class javadoc missing.  Yeah, I am curious as well why
>> preCommit didn't catch this one.
>>
>>
>>
>> On Wed, Jan 23, 2019 at 5:28 PM Alex Amato  wrote:
>>
>>> Did their happen to be a short time window where some missing Javadoc
>>> comments went in? I am now seeing precommit fail due to code I didn't
>>> modify.
>>>
>>>
>>> https://scans.gradle.com/s/nwgb7xegklwqo/console-log?task=:beam-runners-direct-java:checkstyleMain
>>>
>>>
>>>
>>> On Wed, Jan 23, 2019 at 2:34 PM Ruoyun Huang  wrote:
>>>
>>>> Trying to understand your suggestion. By saying "break that
>>>> dependency", do you mean moving checkstyle out of Java PreCommit?
>>>>
>>>> currently we do have checkstyle as part of  ":check".  It seems to me
>>>> "check" does minimal amount of essential works (correct me If I am wrong),
>>>> much less than what PreCommit does.
>>>>
>>>> On Wed, Jan 23, 2019 at 12:20 PM Kenneth Knowles 
>>>> wrote:
>>>>
>>>>> It is always a bummer when the Java PreCommit fails due to style
>>>>> checking. Can we get this to run separately and quicker? I notice it
>>>>> depends on compileJava. I cannot remember why that is, but I recall it is 
>>>>> a
>>>>> legitimate reason. Nonetheless, can we break that dependency somehow?
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Wed, Jan 16, 2019 at 6:42 PM Ruoyun Huang 
>>>>> wrote:
>>>>>
>>>>>> Hi, everyone,
>>>>>>
>>>>>>
>>>>>> To make sure we move forward to a clean state where we catch
>>>>>> violations in any new PR, we created this change:
>>>>>> https://github.com/apache/beam/pull/7532
>>>>>>
>>>>>> This PR makes checkstyle to report error on missing javadocs. For
>>>>>> existing violations, we explicitly added them as suppression rules, down 
>>>>>> to
>>>>>> which line in the code.
>>>>>>
>>>>>> The caveat is, once this PR is merged, whoever make update to any
>>>>>> file in the list, will very likely have to fix the existing violation for
>>>>>> that file.  :-) Hope this sounds like a reasonable idea to move forward.
>>>>>>
>>>>>> In the meanwhile, I will try to address the items in the list (if I
>>>>>> can). And over time, I will get back to this and remove those 
>>>>>> suppressions
>>>>>> no longer needed (created JIRA-6446 for tracking purpose), until all
>>>>>> of them are fixed.
>>>>>>
>>>>>> On Wed, Jan 9, 2019 at 10:57 PM Ruoyun Huang 
>>>>>> wrote:
>>>>>>
>>>>>>> created a PR: https://github.com/apache/beam/pull/7454
>>>>>>>
>>>>>>> Note instead of having separated checkstyle specs for Main versus
>>>>>>> Test, this PR simply uses suppression to turn off JavaDocComment for 
>>>>>>> test
>>>>>>> files.
>>>>>>>
>>>>>>> If this PR draft looks good, then next step I will commit another
>>>>>>> change that:
>>>>>>> 1) throw error on violations (now just warning to keep PR green).
>>>>>>> 2) List all the violations explicitly in a suppression list, and let
>>>>>>> area contributors/owners address and chop things off the list over time.
>&

Re: [ANNOUNCE] New committer announcement: Gleb Kanterov

2019-01-25 Thread Ruoyun Huang
 Congratulations Gleb!

On Fri, Jan 25, 2019 at 9:18 AM Scott Wegner  wrote:

> Congrats, and welcome Gleb!
>
> On Fri, Jan 25, 2019 at 9:15 AM Suneel Marthi  wrote:
>
>> Congratulations
>>
>> On Fri, Jan 25, 2019 at 12:04 PM Anton Kedin  wrote:
>>
>>> Congrats!
>>>
>>> On Fri, Jan 25, 2019 at 8:54 AM Ismaël Mejía  wrote:
>>>
>>>> Well deserved, congratulations Gleb!
>>>>
>>>> On Fri, Jan 25, 2019 at 10:47 AM Etienne Chauchot 
>>>> wrote:
>>>> >
>>>> > Congrats Gleb and welcome onboard !
>>>> >
>>>> > Etienne
>>>> >
>>>> > Le vendredi 25 janvier 2019 à 10:39 +0100, Alexey Romanenko a écrit :
>>>> >
>>>> > Congrats to Gleb and welcome on board!
>>>> >
>>>> > On 25 Jan 2019, at 09:22, Tim Robertson 
>>>> wrote:
>>>> >
>>>> > Welcome Gleb and congratulations!
>>>> >
>>>> > On Fri, Jan 25, 2019 at 8:06 AM Kenneth Knowles 
>>>> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > Please join me and the rest of the Beam PMC in welcoming a new
>>>> committer: Gleb Kanterov
>>>> >
>>>> > Gleb started contributing to Beam and quickly dove deep, doing some
>>>> sensitive fixes to schemas, also general build issues, Beam SQL, Avro, and
>>>> more. In consideration of Gleb's technical and community contributions, the
>>>> Beam PMC trusts Gleb with the responsibilities of a Beam committer [1].
>>>> >
>>>> > Thank you, Gleb, for your contributions.
>>>> >
>>>> > Kenn
>>>> >
>>>> > [1]
>>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>> >
>>>> >
>>>>
>>>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>


-- 

Ruoyun  Huang


Re: Enforce javadoc comments in public methods?

2019-01-23 Thread Ruoyun Huang
Our recent change is on "JavaDocMethod", which not turned on yet. Not
relevant to this error here.

The one throws error is "javaDocType". it has been there for a while
<https://github.com/apache/beam/blame/master/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml#L156>,
which is for public class javadoc missing.  Yeah, I am curious as well why
preCommit didn't catch this one.



On Wed, Jan 23, 2019 at 5:28 PM Alex Amato  wrote:

> Did their happen to be a short time window where some missing Javadoc
> comments went in? I am now seeing precommit fail due to code I didn't
> modify.
>
>
> https://scans.gradle.com/s/nwgb7xegklwqo/console-log?task=:beam-runners-direct-java:checkstyleMain
>
>
>
> On Wed, Jan 23, 2019 at 2:34 PM Ruoyun Huang  wrote:
>
>> Trying to understand your suggestion. By saying "break that dependency",
>> do you mean moving checkstyle out of Java PreCommit?
>>
>> currently we do have checkstyle as part of  ":check".  It seems to me
>> "check" does minimal amount of essential works (correct me If I am wrong),
>> much less than what PreCommit does.
>>
>> On Wed, Jan 23, 2019 at 12:20 PM Kenneth Knowles  wrote:
>>
>>> It is always a bummer when the Java PreCommit fails due to style
>>> checking. Can we get this to run separately and quicker? I notice it
>>> depends on compileJava. I cannot remember why that is, but I recall it is a
>>> legitimate reason. Nonetheless, can we break that dependency somehow?
>>>
>>> Kenn
>>>
>>> On Wed, Jan 16, 2019 at 6:42 PM Ruoyun Huang  wrote:
>>>
>>>> Hi, everyone,
>>>>
>>>>
>>>> To make sure we move forward to a clean state where we catch violations
>>>> in any new PR, we created this change:
>>>> https://github.com/apache/beam/pull/7532
>>>>
>>>> This PR makes checkstyle to report error on missing javadocs. For
>>>> existing violations, we explicitly added them as suppression rules, down to
>>>> which line in the code.
>>>>
>>>> The caveat is, once this PR is merged, whoever make update to any file
>>>> in the list, will very likely have to fix the existing violation for that
>>>> file.  :-) Hope this sounds like a reasonable idea to move forward.
>>>>
>>>> In the meanwhile, I will try to address the items in the list (if I
>>>> can). And over time, I will get back to this and remove those suppressions
>>>> no longer needed (created JIRA-6446 for tracking purpose), until all
>>>> of them are fixed.
>>>>
>>>> On Wed, Jan 9, 2019 at 10:57 PM Ruoyun Huang  wrote:
>>>>
>>>>> created a PR: https://github.com/apache/beam/pull/7454
>>>>>
>>>>> Note instead of having separated checkstyle specs for Main versus
>>>>> Test, this PR simply uses suppression to turn off JavaDocComment for test
>>>>> files.
>>>>>
>>>>> If this PR draft looks good, then next step I will commit another
>>>>> change that:
>>>>> 1) throw error on violations (now just warning to keep PR green).
>>>>> 2) List all the violations explicitly in a suppression list, and let
>>>>> area contributors/owners address and chop things off the list over time.
>>>>> Not ideal and quite some manual work, if there is a better way, please let
>>>>> me know.
>>>>>
>>>>> On Wed, Jan 9, 2019 at 7:29 AM Robert Bradshaw 
>>>>> wrote:
>>>>>
>>>>>> On Tue, Jan 8, 2019 at 11:15 PM Kenneth Knowles 
>>>>>> wrote:
>>>>>> >
>>>>>> > I think @Internal would be a reasonable annotation to exempt from
>>>>>> documentation, as that means it is explicitly *not* part of the actual
>>>>>> public API, as Ismaël alluded to.
>>>>>>
>>>>>> We'll probably want a distinct annotation from that. Forced comments,
>>>>>> especially forced-by-an-impartial-metric ones, are often lower
>>>>>> quality. This is the kind of signal that would be useful to surface to
>>>>>> a reviewer who could then (jointly) make the call rather than it being
>>>>>> a binary failure/success.
>>>>>>
>>>>>> > (I'm still on the docs-on-private-too side of things, but realize
>>>>>> that's an extreme position)
>>>>>>
>>>>

Re: Enforce javadoc comments in public methods?

2019-01-23 Thread Ruoyun Huang
Trying to understand your suggestion. By saying "break that dependency", do
you mean moving checkstyle out of Java PreCommit?

currently we do have checkstyle as part of  ":check".  It seems to me
"check" does minimal amount of essential works (correct me If I am wrong),
much less than what PreCommit does.

On Wed, Jan 23, 2019 at 12:20 PM Kenneth Knowles  wrote:

> It is always a bummer when the Java PreCommit fails due to style checking.
> Can we get this to run separately and quicker? I notice it depends on
> compileJava. I cannot remember why that is, but I recall it is a legitimate
> reason. Nonetheless, can we break that dependency somehow?
>
> Kenn
>
> On Wed, Jan 16, 2019 at 6:42 PM Ruoyun Huang  wrote:
>
>> Hi, everyone,
>>
>>
>> To make sure we move forward to a clean state where we catch violations
>> in any new PR, we created this change:
>> https://github.com/apache/beam/pull/7532
>>
>> This PR makes checkstyle to report error on missing javadocs. For
>> existing violations, we explicitly added them as suppression rules, down to
>> which line in the code.
>>
>> The caveat is, once this PR is merged, whoever make update to any file in
>> the list, will very likely have to fix the existing violation for that
>> file.  :-) Hope this sounds like a reasonable idea to move forward.
>>
>> In the meanwhile, I will try to address the items in the list (if I can).
>> And over time, I will get back to this and remove those suppressions no
>> longer needed (created JIRA-6446 for tracking purpose), until all of
>> them are fixed.
>>
>> On Wed, Jan 9, 2019 at 10:57 PM Ruoyun Huang  wrote:
>>
>>> created a PR: https://github.com/apache/beam/pull/7454
>>>
>>> Note instead of having separated checkstyle specs for Main versus Test,
>>> this PR simply uses suppression to turn off JavaDocComment for test files.
>>>
>>> If this PR draft looks good, then next step I will commit another change
>>> that:
>>> 1) throw error on violations (now just warning to keep PR green).
>>> 2) List all the violations explicitly in a suppression list, and let
>>> area contributors/owners address and chop things off the list over time.
>>> Not ideal and quite some manual work, if there is a better way, please let
>>> me know.
>>>
>>> On Wed, Jan 9, 2019 at 7:29 AM Robert Bradshaw 
>>> wrote:
>>>
>>>> On Tue, Jan 8, 2019 at 11:15 PM Kenneth Knowles 
>>>> wrote:
>>>> >
>>>> > I think @Internal would be a reasonable annotation to exempt from
>>>> documentation, as that means it is explicitly *not* part of the actual
>>>> public API, as Ismaël alluded to.
>>>>
>>>> We'll probably want a distinct annotation from that. Forced comments,
>>>> especially forced-by-an-impartial-metric ones, are often lower
>>>> quality. This is the kind of signal that would be useful to surface to
>>>> a reviewer who could then (jointly) make the call rather than it being
>>>> a binary failure/success.
>>>>
>>>> > (I'm still on the docs-on-private-too side of things, but realize
>>>> that's an extreme position)
>>>>
>>>> +1 to docs on private things as well, though maybe with not as high
>>>> priority :).
>>>>
>>>> > It is a shame that we chose blacklist (via @Internal) instead of
>>>> whitelist (via e.g. @Public) for what constitutes an actual supported
>>>> public method.
>>>>
>>>> Probably better than having to re-train others that public doesn't
>>>> really mean public unless it has a @Public on it. It's harder to
>>>> "unknowingly" use an @Internal API.
>>>>
>>>>
>>>> > Kenn
>>>> >
>>>> > On Tue, Jan 8, 2019 at 1:46 PM Ruoyun Huang 
>>>> wrote:
>>>> >>
>>>> >> To Ismael's question:  When applying such a check (i.e. public
>>>> method with >30 Loc), our code base shows in total 115 violations.
>>>> >>
>>>> >> Thanks for the feedback everyone. As some of you mentioned already,
>>>> suppress warning is always available whenever contributor/reviewer feels
>>>> appropriate, instead of been forced to put in low quality comments. This
>>>> check is more about to help us avoid human errors, in those cases we do
>>>> want to add meaningful javadocs.
>>>> >>
>&

Re: compileJava broken on master see: BEAM-6495

2019-01-23 Thread Ruoyun Huang
On Wed, Jan 23, 2019 at 11:13 AM Alex Amato  wrote:

> Okay, make sense perhaps we can somehow make it fail when it fails to
> generate the dep, rather than when compiling the java code later on
>

That would be a good improvement on error message.  :-)

Or, does it make sense to manually check-in this particular error-prone
auto-gen code until this issue is resolved?


>
> On Wed, Jan 23, 2019 at 11:12 AM Anton Kedin  wrote:
>
>> ParserImpl is autogenerated by Calcite at build time. It seems that
>> there's a race condition there and it sometimes fails. Rerunning the build
>> works for me.
>>
>> Regards,
>> Anton
>>
>> On Wed, Jan 23, 2019, 11:06 AM Alex Amato  wrote:
>>
>>> https://jira.apache.org/jira/browse/BEAM-6495?filter=-2
>>>
>>> Any ideas, how this got through the precommit?
>>>
>>> > Task :beam-sdks-java-extensions-sql:compileJava FAILED
>>>
>>> /usr/local/google/home/ajamato/go/src/
>>> github.com/apache/beam/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcFactory.java:29:
>>> error: cannot find symbol
>>>
>>> import
>>> org.apache.beam.sdk.extensions.sql.impl.parser.impl.BeamSqlParserImpl;
>>>
>>>       ^
>>>
>>>   symbol:   class BeamSqlParserImpl
>>>
>>>   location: package org.apache.beam.sdk.extensions.sql.impl.parser.impl
>>>
>>> 1 error
>>>
>>>

-- 

Ruoyun  Huang


Re: compileJava broken on master see: BEAM-6495

2019-01-23 Thread Ruoyun Huang
I ran into the same issue. it is flaky, due to some depended package.

short term solution is to rerun precommit. For me it was gone on second
try.

On Wed, Jan 23, 2019 at 11:06 AM Alex Amato  wrote:

> https://jira.apache.org/jira/browse/BEAM-6495?filter=-2
>
> Any ideas, how this got through the precommit?
>
> > Task :beam-sdks-java-extensions-sql:compileJava FAILED
>
> /usr/local/google/home/ajamato/go/src/
> github.com/apache/beam/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcFactory.java:29:
> error: cannot find symbol
>
> import
> org.apache.beam.sdk.extensions.sql.impl.parser.impl.BeamSqlParserImpl;
>
>   ^
>
>   symbol:   class BeamSqlParserImpl
>
>   location: package org.apache.beam.sdk.extensions.sql.impl.parser.impl
>
> 1 error
>
>

-- 

Ruoyun  Huang


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Ruoyun Huang
+1  getting the same issue as well.

I saw there were @Ignore on those tests before. If it is not critical and
just caused by the way how we do test, does it make sense to put those
@Ignores back until it's resolved?


On Tue, Jan 22, 2019 at 3:35 PM Alex Amato  wrote:

> I've seen this fail in a few different PRs for different contributors, and
> its causing some issues during the presubmit process.. This is a
> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
> the source of the problem.
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>
> I filed a JIRA for this issue:
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
>

-- 

Ruoyun  Huang


Re: Confusing sentence in Windowing section in Beam programming guide

2019-01-18 Thread Ruoyun Huang
Very helpful discussion (and the fixing PR).

To make sure my take-way is correct. The status quo is a) "for a Global
Window, then there is *no possible scenario* where data is identified as
late".  Rather than b) "for a global window we *no longer* compare
watermark to identify late data, but *there is still other criteria* that
determines data late".

a) is correct and b) is not.  Is that so?

On Thu, Jan 17, 2019 at 8:57 PM Kenneth Knowles  wrote:

> Actually, Reuven, that's no longer the case.
>
> It used to be that incoming data was compared to the watermark but it is
> not today. Instead, Jeff's first phrasing is perfect.
>
> One way to see it is the think about what are the consequences of late
> data: if there is a grouping/aggregation by key+window, the window
> determines when the grouping is complete. We go ahead and include any data
> that shows up before the window is complete. And if you set up allowed
> lateness it matches exactly: any data that arrives before the ON_TIME
> output gets to be in that output.
>
> Previously, when we compared incoming elements to the watermark directly,
> you could have a window that was still being aggregated but the elements
> that fell in the window were dropped. There was no technical benefit to
> losing this data, so we stopped dropping it. We also had lots of tricky
> bugs and hard-to-manage code related to what we do if an element arrives
> after the watermark. And you could have an ON_TIME firing that included a
> bunch of "late" data which is confusing.
>
> Now it is simple: if the window is still alive, the element goes into it.
>
> I very rarely use the term "late data" when describing Beam's semantics
> anyhow. I always found the term / definition a bit arbitrary.
>
> Kenn
>
> On Thu, Jan 17, 2019 at 8:13 PM Rui Wang  wrote:
>
>> I created this PR: https://github.com/apache/beam/pull/7556
>>
>> Feel free to review/comment it.
>>
>> -Rui
>>
>> On Thu, Jan 17, 2019 at 2:37 PM Rui Wang  wrote:
>>
>>> It might be better to keep something like "watermark usually
>>> consistently moves forward". But "Elements that arrive with a smaller
>>> timestamp than the current watermark are considered late data." has already
>>> given the order of late data ts and watermark.
>>>
>>>
>>> -Rui
>>>
>>> On Thu, Jan 17, 2019 at 1:39 PM Jeff Klukas  wrote:
>>>
>>>> Reuven - I don't think I realized it was possible to have late data
>>>> with the global window, so I'm definitely learning things through this
>>>> discussion.
>>>>
>>>> New suggested wording, then:
>>>>
>>>> Elements that arrive with a smaller timestamp than the current
>>>> watermark are considered late data.
>>>>
>>>> That says basically the same thing as the wording currently in the
>>>> guide, but uses "smaller" (which implies a less-than-watermark comparison)
>>>> rather than "later" (which folks have interpreted as a
>>>> greater-than-watermark comparison).
>>>>
>>>> On Thu, Jan 17, 2019 at 3:40 PM Reuven Lax  wrote:
>>>>
>>>>> Though it's not tied to window. You could be in the global window, so
>>>>> the watermark never advances past the end of the window, yet still get 
>>>>> late
>>>>> data.
>>>>>
>>>>> On Thu, Jan 17, 2019, 11:14 AM Jeff Klukas >>>>
>>>>>> How about: "Once the watermark progresses past the end of a window,
>>>>>> any further elements that arrive with a timestamp in that window are
>>>>>> considered late data."
>>>>>>
>>>>>> On Thu, Jan 17, 2019 at 1:43 PM Rui Wang  wrote:
>>>>>>
>>>>>>> Hi Community,
>>>>>>>
>>>>>>> In Beam programming guide [1], there is a sentence: "Data that
>>>>>>> arrives with a timestamp after the watermark is considered *late
>>>>>>> data*"
>>>>>>>
>>>>>>> Seems like people get confused by it. For example, see Stackoverflow
>>>>>>> comment [2]. Basically it makes people think that a event timestamp 
>>>>>>> that is
>>>>>>> bigger than watermark is considered late (due to that "after").
>>>>>>>
>>>>>>> Although there is a example right after this sentence to explain
>>>>>>> late data, seems to me that this sentence is incomplete. The complete
>>>>>>> sentence to me can be: "The watermark consistently advances from -inf to
>>>>>>> +inf. Data that arrives with a timestamp after the watermark is 
>>>>>>> considered
>>>>>>> late data."
>>>>>>>
>>>>>>> Am I understand correctly? Is there better description for the order
>>>>>>> of late data and watermark? I would happy to send PR to update Beam
>>>>>>> documentation.
>>>>>>>
>>>>>>> -Rui
>>>>>>>
>>>>>>> [1]:
>>>>>>> https://beam.apache.org/documentation/programming-guide/#windowing
>>>>>>> [2]:
>>>>>>> https://stackoverflow.com/questions/54141352/dataflow-to-process-late-and-out-of-order-data-for-batch-and-stream-messages/54188971?noredirect=1#comment95302476_54188971
>>>>>>>
>>>>>>>
>>>>>>>

-- 

Ruoyun  Huang


Re: Proposal: Portability SDKHarness Docker Image Release with Beam Version Release.

2019-01-16 Thread Ruoyun Huang
register docker image to docker image registry and not have
>>>> bintray in the name to later host images on a different vendor for future
>>>> proofing.
>>>>
>>>>
>>>>> [1] https://bintray.com/account/pricing?tab=account=pricing
>>>>>
>>>>>
>>>>>>
>>>>>> On Wed, Jan 16, 2019 at 5:11 PM Ahmet Altay  wrote:
>>>>>>
>>>>>>> This sounds like a good idea. Some questions:
>>>>>>>
>>>>>>> - Could we start from snapshots first and then do it for releases?
>>>>>>> - For snapshots, do we need to clean old containers after a while?
>>>>>>> Otherwise I guess we will accumulate lots of containers.
>>>>>>> - Do we also need additional code changes for snapshots and releases
>>>>>>> to default to these specific containers? There could be a version based
>>>>>>> mechanism to resolve the correct container to use.
>>>>>>>
>>>>>>> On Wed, Jan 16, 2019 at 4:42 PM Ankur Goenka 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> As portability/FnApi is taking shape and are compatible with ULR
>>>>>>>> and Flink. I wanted to discuss the release plan release of SDKHarness
>>>>>>>> Docker images. Of-course users can create their own images but it will 
>>>>>>>> be
>>>>>>>> useful to have a default image available out of box.
>>>>>>>> Pre build image are a must for making FnApi available for users and
>>>>>>>> not just the developers.
>>>>>>>> The other purpose of these images is to be server as base image
>>>>>>>> layer for building custom images.
>>>>>>>>
>>>>>>>> Apache already have bintray repositories for beam.
>>>>>>>> https://bintray.com/apache/beam-snapshots-docker
>>>>>>>> https://bintray.com/apache/beam-docker
>>>>>>>>
>>>>>>>> Shall we start pushing Python/Java/Go SDK Harness containers to
>>>>>>>> https://bintray.com/apache/beam-docker for beam release and
>>>>>>>> maintain daily snapshot at
>>>>>>>> https://bintray.com/apache/beam-snapshots-docker ?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Ankur
>>>>>>>>
>>>>>>>

-- 

Ruoyun  Huang


Re: Enforce javadoc comments in public methods?

2019-01-16 Thread Ruoyun Huang
Hi, everyone,


To make sure we move forward to a clean state where we catch violations in
any new PR, we created this change: https://github.com/apache/beam/pull/7532

This PR makes checkstyle to report error on missing javadocs. For existing
violations, we explicitly added them as suppression rules, down to which
line in the code.

The caveat is, once this PR is merged, whoever make update to any file in
the list, will very likely have to fix the existing violation for that
file.  :-) Hope this sounds like a reasonable idea to move forward.

In the meanwhile, I will try to address the items in the list (if I can).
And over time, I will get back to this and remove those suppressions no
longer needed (created JIRA-6446 for tracking purpose), until all of them
are fixed.

On Wed, Jan 9, 2019 at 10:57 PM Ruoyun Huang  wrote:

> created a PR: https://github.com/apache/beam/pull/7454
>
> Note instead of having separated checkstyle specs for Main versus Test,
> this PR simply uses suppression to turn off JavaDocComment for test files.
>
> If this PR draft looks good, then next step I will commit another change
> that:
> 1) throw error on violations (now just warning to keep PR green).
> 2) List all the violations explicitly in a suppression list, and let area
> contributors/owners address and chop things off the list over time.  Not
> ideal and quite some manual work, if there is a better way, please let me
> know.
>
> On Wed, Jan 9, 2019 at 7:29 AM Robert Bradshaw 
> wrote:
>
>> On Tue, Jan 8, 2019 at 11:15 PM Kenneth Knowles  wrote:
>> >
>> > I think @Internal would be a reasonable annotation to exempt from
>> documentation, as that means it is explicitly *not* part of the actual
>> public API, as Ismaël alluded to.
>>
>> We'll probably want a distinct annotation from that. Forced comments,
>> especially forced-by-an-impartial-metric ones, are often lower
>> quality. This is the kind of signal that would be useful to surface to
>> a reviewer who could then (jointly) make the call rather than it being
>> a binary failure/success.
>>
>> > (I'm still on the docs-on-private-too side of things, but realize
>> that's an extreme position)
>>
>> +1 to docs on private things as well, though maybe with not as high
>> priority :).
>>
>> > It is a shame that we chose blacklist (via @Internal) instead of
>> whitelist (via e.g. @Public) for what constitutes an actual supported
>> public method.
>>
>> Probably better than having to re-train others that public doesn't
>> really mean public unless it has a @Public on it. It's harder to
>> "unknowingly" use an @Internal API.
>>
>>
>> > Kenn
>> >
>> > On Tue, Jan 8, 2019 at 1:46 PM Ruoyun Huang  wrote:
>> >>
>> >> To Ismael's question:  When applying such a check (i.e. public method
>> with >30 Loc), our code base shows in total 115 violations.
>> >>
>> >> Thanks for the feedback everyone. As some of you mentioned already,
>> suppress warning is always available whenever contributor/reviewer feels
>> appropriate, instead of been forced to put in low quality comments. This
>> check is more about to help us avoid human errors, in those cases we do
>> want to add meaningful javadocs.
>> >>
>> >> With 5 +1s so far.  I will put together a PR draft.   A bit research
>> is still needed regarding the best practise to apply check to Main/Test in
>> a different way. If anyone has experience on it, please share it with me.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, Jan 8, 2019 at 8:19 AM Ismaël Mejía  wrote:
>> >>>
>> >>> -0
>> >>>
>> >>> Same comments than Robert I am particularly worried on how this affect
>> >>> contributors in particular casual ones. Even if the intended idea is
>> >>> good I am also worried that people just write poor comments to get rid
>> >>> of the annoyance.
>> >>>
>> >>> Have you already estimated how hard is the current codebase impacted?
>> >>> Or how many methods will be needed to document before this gets in
>> >>> place?
>> >>>
>> >>> I wouldn't be surprised if many runners or internal parts of the
>> >>> codebase lack comments on public methods considering that the 'public
>> >>> API' of must runners 'is not' the public methods but the specific
>> >>> PipelineOptions, and for some methods (even longer ones) such comments
>> >>> may be trivial.
>> >>>

Re: Our jenkins beam1 server is down

2019-01-16 Thread Ruoyun Huang
With another try, succeeding on beam10.

Thanks for the fix.

On Wed, Jan 16, 2019 at 3:53 PM Ruoyun Huang  wrote:

> Just did a rerun, got error saying "*10:12:21* ERROR: beam14 is offline;
> cannot locate JDK 1.8 (latest)".
>
> Beam1 is not the only one broken?
>
> On Wed, Jan 16, 2019 at 3:45 PM Yifan Zou  wrote:
>
>> The beam1 was still accepting jobs and breaking them after reset this
>> morning. We temporarily disconnect it so that jobs could be scheduled on
>> healthy nodes. Infra is making efforts to fix beam1.
>>
>> On Wed, Jan 16, 2019 at 11:15 AM Yifan Zou  wrote:
>>
>>> The VM instance was reset and Infra is trying to repuppetize it.
>>> https://issues.apache.org/jira/browse/INFRA-17672 is created to track
>>> this issue.
>>>
>>> On Wed, Jan 16, 2019 at 10:51 AM Mark Liu  wrote:
>>>
>>>> Thanks you Yifan!
>>>>
>>>> Looks like following precommits are affected according to my PR:
>>>>
>>>> Java_Examples_Dataflow,
>>>> Portable_Python,
>>>> Website_Stage_GCS
>>>>
>>>> On Wed, Jan 16, 2019 at 9:25 AM Yifan Zou  wrote:
>>>>
>>>>> I am looking on it.
>>>>>
>>>>> On Wed, Jan 16, 2019 at 8:18 AM Ismaël Mejía 
>>>>> wrote:
>>>>>
>>>>>> Can somebody PTAL. Sadly the poor jenkins shuffling algorithm is
>>>>>> sending most builds to it so there are issues to validate some PRs.
>>>>>>
>>>>>
>
> --
> 
> Ruoyun  Huang
>
>

-- 

Ruoyun  Huang


Re: Our jenkins beam1 server is down

2019-01-16 Thread Ruoyun Huang
Just did a rerun, got error saying "*10:12:21* ERROR: beam14 is offline;
cannot locate JDK 1.8 (latest)".

Beam1 is not the only one broken?

On Wed, Jan 16, 2019 at 3:45 PM Yifan Zou  wrote:

> The beam1 was still accepting jobs and breaking them after reset this
> morning. We temporarily disconnect it so that jobs could be scheduled on
> healthy nodes. Infra is making efforts to fix beam1.
>
> On Wed, Jan 16, 2019 at 11:15 AM Yifan Zou  wrote:
>
>> The VM instance was reset and Infra is trying to repuppetize it.
>> https://issues.apache.org/jira/browse/INFRA-17672 is created to track
>> this issue.
>>
>> On Wed, Jan 16, 2019 at 10:51 AM Mark Liu  wrote:
>>
>>> Thanks you Yifan!
>>>
>>> Looks like following precommits are affected according to my PR:
>>>
>>> Java_Examples_Dataflow,
>>> Portable_Python,
>>> Website_Stage_GCS
>>>
>>> On Wed, Jan 16, 2019 at 9:25 AM Yifan Zou  wrote:
>>>
>>>> I am looking on it.
>>>>
>>>> On Wed, Jan 16, 2019 at 8:18 AM Ismaël Mejía  wrote:
>>>>
>>>>> Can somebody PTAL. Sadly the poor jenkins shuffling algorithm is
>>>>> sending most builds to it so there are issues to validate some PRs.
>>>>>
>>>>

-- 

Ruoyun  Huang


Re: Enforce javadoc comments in public methods?

2019-01-09 Thread Ruoyun Huang
created a PR: https://github.com/apache/beam/pull/7454

Note instead of having separated checkstyle specs for Main versus Test,
this PR simply uses suppression to turn off JavaDocComment for test files.

If this PR draft looks good, then next step I will commit another change
that:
1) throw error on violations (now just warning to keep PR green).
2) List all the violations explicitly in a suppression list, and let area
contributors/owners address and chop things off the list over time.  Not
ideal and quite some manual work, if there is a better way, please let me
know.

On Wed, Jan 9, 2019 at 7:29 AM Robert Bradshaw  wrote:

> On Tue, Jan 8, 2019 at 11:15 PM Kenneth Knowles  wrote:
> >
> > I think @Internal would be a reasonable annotation to exempt from
> documentation, as that means it is explicitly *not* part of the actual
> public API, as Ismaël alluded to.
>
> We'll probably want a distinct annotation from that. Forced comments,
> especially forced-by-an-impartial-metric ones, are often lower
> quality. This is the kind of signal that would be useful to surface to
> a reviewer who could then (jointly) make the call rather than it being
> a binary failure/success.
>
> > (I'm still on the docs-on-private-too side of things, but realize that's
> an extreme position)
>
> +1 to docs on private things as well, though maybe with not as high
> priority :).
>
> > It is a shame that we chose blacklist (via @Internal) instead of
> whitelist (via e.g. @Public) for what constitutes an actual supported
> public method.
>
> Probably better than having to re-train others that public doesn't
> really mean public unless it has a @Public on it. It's harder to
> "unknowingly" use an @Internal API.
>
>
> > Kenn
> >
> > On Tue, Jan 8, 2019 at 1:46 PM Ruoyun Huang  wrote:
> >>
> >> To Ismael's question:  When applying such a check (i.e. public method
> with >30 Loc), our code base shows in total 115 violations.
> >>
> >> Thanks for the feedback everyone. As some of you mentioned already,
> suppress warning is always available whenever contributor/reviewer feels
> appropriate, instead of been forced to put in low quality comments. This
> check is more about to help us avoid human errors, in those cases we do
> want to add meaningful javadocs.
> >>
> >> With 5 +1s so far.  I will put together a PR draft.   A bit research is
> still needed regarding the best practise to apply check to Main/Test in a
> different way. If anyone has experience on it, please share it with me.
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Jan 8, 2019 at 8:19 AM Ismaël Mejía  wrote:
> >>>
> >>> -0
> >>>
> >>> Same comments than Robert I am particularly worried on how this affect
> >>> contributors in particular casual ones. Even if the intended idea is
> >>> good I am also worried that people just write poor comments to get rid
> >>> of the annoyance.
> >>>
> >>> Have you already estimated how hard is the current codebase impacted?
> >>> Or how many methods will be needed to document before this gets in
> >>> place?
> >>>
> >>> I wouldn't be surprised if many runners or internal parts of the
> >>> codebase lack comments on public methods considering that the 'public
> >>> API' of must runners 'is not' the public methods but the specific
> >>> PipelineOptions, and for some methods (even longer ones) such comments
> >>> may be trivial.
> >>>
> >>> On Tue, Jan 8, 2019 at 5:16 PM Kenneth Knowles 
> wrote:
> >>> >
> >>> > +1 I even thought this was already on (at some point).
> >>> >
> >>> > On Tue, Jan 8, 2019 at 8:01 AM Scott Wegner 
> wrote:
> >>> >>
> >>> >> I would even propose applying this to non-public methods, but I
> suspect that would be more controversial.
> >>> >
> >>> >
> >>> > I also would support this. It will improve code quality as well.
> Often missing doc means something is not well thought-out. It often also
> indicates a misguided attempt to "share code" without sharing a clear
> concept.
> >>> >
> >>> >> I share Robert's concern for random victims hitting the policy when
> a method grows from N-1 to N lines. This can easily happen with automatic
> refactoring + spotless code formatting. For example, renaming a variable to
> a longer name can introduce new line-breaks. But, I'm think code
> documentation is valuable enough that it's still worth it.
>

Re: Enforce javadoc comments in public methods?

2019-01-08 Thread Ruoyun Huang
To Ismael's question:  When applying such a check (i.e. public method with
>30 Loc), our code base shows in total 115 violations.

Thanks for the feedback everyone. As some of you mentioned already,
suppress warning is always available whenever contributor/reviewer feels
appropriate, instead of been forced to put in low quality comments. This
check is more about to help us avoid human errors, in those cases we do
want to add meaningful javadocs.

With 5 +1s so far.  I will put together a PR draft.   A bit research is
still needed regarding the best practise to apply check to Main/Test in a
different way. If anyone has experience on it, please share it with me.





On Tue, Jan 8, 2019 at 8:19 AM Ismaël Mejía  wrote:

> -0
>
> Same comments than Robert I am particularly worried on how this affect
> contributors in particular casual ones. Even if the intended idea is
> good I am also worried that people just write poor comments to get rid
> of the annoyance.
>
> Have you already estimated how hard is the current codebase impacted?
> Or how many methods will be needed to document before this gets in
> place?
>
> I wouldn't be surprised if many runners or internal parts of the
> codebase lack comments on public methods considering that the 'public
> API' of must runners 'is not' the public methods but the specific
> PipelineOptions, and for some methods (even longer ones) such comments
> may be trivial.
>
> On Tue, Jan 8, 2019 at 5:16 PM Kenneth Knowles  wrote:
> >
> > +1 I even thought this was already on (at some point).
> >
> > On Tue, Jan 8, 2019 at 8:01 AM Scott Wegner  wrote:
> >>
> >> I would even propose applying this to non-public methods, but I suspect
> that would be more controversial.
> >
> >
> > I also would support this. It will improve code quality as well. Often
> missing doc means something is not well thought-out. It often also
> indicates a misguided attempt to "share code" without sharing a clear
> concept.
> >
> >> I share Robert's concern for random victims hitting the policy when a
> method grows from N-1 to N lines. This can easily happen with automatic
> refactoring + spotless code formatting. For example, renaming a variable to
> a longer name can introduce new line-breaks. But, I'm think code
> documentation is valuable enough that it's still worth it.
> >
> >
> > Another perspective is that someone is getting away with missing
> documentation at N-1. Seems OK. But maybe just allowMissingPropertyJavadoc
> (from http://checkstyle.sourceforge.net/config_javadoc.html#JavadocMethod)?
> We can also configure allowedAnnotations but if you are going to go through
> the trouble of annotating something, a javadoc comment is just as easy.
> >
> > I want to caveat this: I am strongly opposed to any requirements on the
> contents of the javadoc, which is almost all the time redundant bloat if
> the description is at all adequate.
> >
> > Kenn
> >
> >
> >>
> >> On Tue, Jan 8, 2019 at 4:03 AM Robert Bradshaw 
> wrote:
> >>>
> >>> With the clarification that we're looking at the intersection of
> >>> public + "big", I think this is a great idea. We should make it clear
> >>> that this is a lower bar--many private or shorter methods merit
> >>> documentation as well (but that's harder to automatically detect). The
> >>> one difficulty with a threshold is that it's painful for the person
> >>> who does some refactoring or other minor work and turns the (say)
> >>> 29-line method into a 30-line one. This is a case where as suppression
> >>> annotation (appropriately used) could be useful.
> >>>
> >>> On Tue, Jan 8, 2019 at 1:49 AM Daniel Oliveira 
> wrote:
> >>> >
> >>> > +1
> >>> >
> >>> > I like this idea, especially with the line number requirement. The
> exact number of lines is debatable, but you could go as low as 10 lines and
> that would exclude any trivial setters and getters. Even better might be if
> it's possible to configure checkstyle to ignore this for getters and
> setters (I don't know if checkstyle supports this, but I know that other
> tools are able to auto-detect getters and setters).
> >>> >
> >>> > I'm not dead-set against having annotation to suppress the comment,
> but it carries the risk that code will be left un-commented because both
> the dev and reviewer think it's self-explanatory, and then someone new to
> the codebase finds it confusing.
> >>> >
> >>> > On Mon, Jan 7, 2019 at 11:31 AM Ankur Goenka 
> wrote:
> >>> 

Re: Enforce javadoc comments in public methods?

2019-01-07 Thread Ruoyun Huang
Yeah. Agree there is no reason to enforce anything for trivial methods like
setter/getter.

What I meant is to enforce only for a method that is *BOTH* 1) public
method 2) has longer than N lines.

sorry for not making the proposal clear enough in the original message, it
should've better titled "enforce ... on non-trivial public methods".



On Mon, Jan 7, 2019 at 1:31 AM Robert Bradshaw  wrote:

> IMHO, requiring comments on trivial methods like setters and getters
> is often a net negative, but setting some standard could be useful.
>
> On Mon, Jan 7, 2019 at 7:35 AM Jean-Baptiste Onofré 
> wrote:
> >
> > Hi,
> >
> > for the presence of a comment on public method, it's a good idea. Now,
> > about the number of lines, not sure it's a good idea. I'm thinking about
> > the getter/setter which are public. Most of the time, the comment is
> > pretty simple (and useless ;)).
> >
> > Regards
> > JB
> >
> > On 07/01/2019 04:35, Ruoyun Huang wrote:
> > > Hi, everyone,
> > >
> > >
> > > We were wondering whether it is a good idea to make checkstyle
> > > enforce public method comments. Our current behavior of JavaDoc check
> is:
> > >
> > >  1.
> > >
> > > Missing Class javadoc comment is reported as error.
> > >
> > >  2.
> > >
> > > Method comment missing is explicitly allowed. see [1].  It is not
> > > even shown as warning.
> > >
> > >  3.
> > >
> > > The actual javadoc target gives warning when certain tags are
> > > missing in javadoc, but not if the whole comment is missing.
> > >
> > >
> > >How about we enforce method comments for **1) public method and 2)
> > > method that is longer than N lines**. (N=~30 seems a good number,
> > > leading to ~50 violations in current repository). I can find out the
> > > corresponding contributors to fill in the missing comments, before we
> > > turning the check fully on.
> > >
> > >
> > >One caveat though is that we might want skip this check on test
> code,
> > > but I am not sure yet if our current setup can easily handle separated
> > > rules for main code versus test code.
> > >
> > >
> > > Is this a good idea?  Thoughts and suggestions?
> > >
> > >
> > > [1]
> > >
> https://github.com/apache/beam/blame/5ceffb246c0c38ad68dd208e951a1f39c90ef85c/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml#L111
> > >
> > >
> > > Cheers,
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>


-- 

Ruoyun  Huang


Enforce javadoc comments in public methods?

2019-01-06 Thread Ruoyun Huang
Hi, everyone,

We were wondering whether it is a good idea to make checkstyle enforce
public method comments. Our current behavior of JavaDoc check is:

   1.

   Missing Class javadoc comment is reported as error.
   2.

   Method comment missing is explicitly allowed. see [1].  It is not even
   shown as warning.
   3.

   The actual javadoc target gives warning when certain tags are missing in
   javadoc, but not if the whole comment is missing.


   How about we enforce method comments for **1) public method and 2)
method that is longer than N lines**. (N=~30 seems a good number, leading
to ~50 violations in current repository). I can find out the corresponding
contributors to fill in the missing comments, before we turning the check
fully on.

   One caveat though is that we might want skip this check on test code,
but I am not sure yet if our current setup can easily handle separated
rules for main code versus test code.

Is this a good idea?  Thoughts and suggestions?

[1]
https://github.com/apache/beam/blame/5ceffb246c0c38ad68dd208e951a1f39c90ef85c/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml#L111

Cheers,


Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-14 Thread Ruoyun Huang
Great work in making this happen, Chamikara!

PS> I updated new releases in the real wiki
<https://en.wikipedia.org/wiki/Apache_Beam#Timeline>page as well. :-P

On Thu, Dec 13, 2018 at 7:42 PM Chamikara Jayalath 
wrote:

> The Apache Beam team is pleased to announce the release of version 2.9.0!
>
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous) processing. See https://beam.apache.org
>
> You can download the release here:
>
> https://beam.apache.org/get-started/downloads/
>
> This release includes the following major new features & improvements.
> Please see the blog post for more details:
> https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html
>
> Thanks to everyone who contributed to this release, and we hope you enjoy
> using Beam 2.9.0.
> -- Chamikara Jayalath, on behalf of The Apache Beam team
>


-- 

Ruoyun  Huang


Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-11 Thread Ruoyun Huang
+1,  Looking forward to the release!

On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath 
wrote:

> Hi All,
>
> I ran Beam RC verification script [1] and updated the validation
> spreadsheet [2]. I think the current release candidate looks good.
>
> So +1 for the release.
>
> Thanks,
> Cham
>
> [1]
> https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
> [2]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>
> On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía  wrote:
>
>> Looking at the dates on the Spark runner git log there was a PR merged to
>> change Spark translation from classes to URNs. I cannot see how this can
>> impact performance. Looking at the other queries in the dashboards, there
>> seems to be a great variability in the executions of the Spark runner to
>> the point of feeling we don't have guarantees anymore. I wonder if this was
>> because of other loads shared in the server(s), or because our sample is
>> too small for the standard deviation.
>>
>> I would proceed with the release, the real question is if we can somehow
>> constraint the execution of this tests to have a more consistent output.
>>
>>
>> On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot 
>> wrote:
>>
>>> Hi all,
>>> Regarding query7 in spark:
>>> - there doesn't seem to be a functional regression: query passes and
>>> output size is still the same
>>>
>>> - Also the performance degradation seems to be only on spark, the other
>>> runners do not seem to suffer from it.
>>>
>>> - performance degradation seems to be constant from 11/12 so we can
>>> eliminate temporary load on the jenkins server that would generate delays
>>> in Max transform.
>>>
>>> => query7 uses Max transform, fanout and side inputs, has one of these
>>> parts recently (11/12/18) changed in spark?
>>>
>>> Etienne
>>>
>>> Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a écrit :
>>>
>>> Udi or anybody else who is familiar about Nexmark,  please -1 the vote
>>> thread if you think this particular performance regression for Spark/Direct
>>> runners is a blocker. Otherwise I think we can continue the vote.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath 
>>> wrote:
>>>
>>> Are either of these regressions due to known issues ? If not should they
>>> be considered release blockers ?
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:
>>>
>>> For DirectRunner there are regressions in query 7 sql direct runner
>>> batch mode
>>> <https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424=732741424=411089194>
>>>  (2x)
>>> and streaming mode (5x).
>>>
>>>
>>> On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:
>>>
>>> I see a regression for query 7 spark runner batch mode
>>> <https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712=1782465104=462502368>
>>>  on
>>> about 2018-11-13.
>>> [image: image.png]
>>>
>>> On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath 
>>> wrote:
>>>
>>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #1 for the version
>>> 2.9.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> The complete staging area is available for your review, which includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.9.0-RC1" [5],
>>> * website pull request listing the release [6] and publishing the API
>>> reference manual [7].
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>> * Validation sheet with a tab for 2.9.0 release to help with validation
>>> [7].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1]
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1054/
>>> [5] https://github.com/apache/beam/tree/v2.9.0-RC1
>>> [6] https://github.com/apache/beam/pull/7215
>>> [7] https://github.com/apache/beam-site/pull/584
>>> [8]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>>
>>>

-- 

Ruoyun  Huang


Review for [BEAM-2928] SideInput in ULR

2018-12-11 Thread Ruoyun Huang
Hi,


I am looking for reviews and suggestions, regarding side input in ULR [1]. We
have planned steps described in this doc [2].


A draft of PR, for the first step, is created. Would like to ask for
suggestions on [1] (maybe [2] as well). Figured maybe getting some feedback
would be of great help, in case I go too far with something less preferred.

What this PR [1] has done (limited to Step One):

   1.

   Setup and pass stateAPI server description, and contexts, correctly.
   2.

   Creates a SideInput Handler (by borrowing ideas from Flink
   implementation) that does the KV look up. This basically implements a
   skeleton of SideInputHandler without wiring to ULR’s job graphs.
   3.

   When runner sends back a constant integer, I can see data flowing
   correctly, up to the point of encoding happens. The example pipeline I use
   is a WordCount, with an integer sideinput added.

What this PR [1] does not do yet (but otherwise should be, to complete Step
One):

   1.

   When runner sends back a constant Integer (1), there is an error during
   data encoding:

“””

Caused by: java.lang.IllegalStateException: java.lang.ClassCastException:
java.lang.Integer cannot be cast to [B

at org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)

at
org.apache.beam.repackaged.beam_runners_direct_java.runners.fnexecution.state.StateRequestHandlers$StateRequestHandlerToSideInputHandlerFactoryAdapter.handleGetRequest(StateRequestHandlers.java:300)
at
org.apache.beam.repackaged.beam_runners_direct_java.runners.fnexecution.state.StateRequestHandlers$StateRequestHandlerToSideInputHandlerFactoryAdapter.handle(StateRequestHandlers.java:266)
at
org.apache.beam.repackaged.beam_runners_direct_java.runners.fnexecution.state.StateRequestHandlers$StateKeyTypeDelegatingStateRequestHandler.handle(StateRequestHandlers.java:205)

“””

I am still trying to understand why this coder throws CastException, please
suggest if I did something wrong at high level.

[1] https://issues.apache.org/jira/browse/BEAM-2928
[2] http://bit.ly/2EbqCKd

Thanks!

-- 

Ruoyun  Huang


Re: To create a WordCount-SideInput.java example?

2018-11-26 Thread Ruoyun Huang
Thanks Kenneth. Didn't look into subfolders, let me read a bit more.  And
will look into the tests Luke pointed out as well.

To make sure I understand your comments of "Side inputs _are_ different in
streaming as *you* have to ...", are you saying either: 1) a user needs to
use/treat SideInput API differently when handling streaming case, OR 2)
Beam developments had to do the underlying implementations differently?


On Wed, Nov 21, 2018 at 7:50 PM Kenneth Knowles  wrote:

> I like the idea of a/many very simple example(s) of side inputs. There are
> existing examples that use side inputs:
>
> $ cd examples/java/src/main/java/org/apache/beam/examples
> $ grep -r withSideInput .
> ./complete/TfIdf.java:  .withSideInputs(totalDocuments));
> ./complete/game/GameStats.java:
> .withSideInputs(globalMeanScore));
> ./complete/game/GameStats.java:
> .withSideInputs(spammersView))
> ./cookbook/FilterExamples.java:
> .withSideInputs(globalMeanTemp));
>
> From just this grep It looks like all but one are broadcast scalar values.
> I have not looked at them to see if they are too complex or too trivial.
>
> Side inputs _are_ different in streaming as you have to pause the main
> input or push back elements until a side input is ready for a window.
>
> I would suggest multiple simple examples each showing one way of using
> side inputs. A particular thing to demonstrated might be a triggered
> Combine.perKey() and tutorial that it requires a View.asMultimap() because
> triggers result in duplicate entries for a key.
>

> Kenn
>
> On Wed, Nov 21, 2018 at 4:40 PM Ruoyun Huang  wrote:
>
>> Hi,
>>
>> I am working on sideInput support in java reference runner (ULR)
>> JIRA-2928 [1].
>> Although there are inline code snippet example [2] and unit tests [3], I
>> did not find
>> a good place showing a working example of SideInput(please correct me if
>> I am wrong).
>> I am thinking of creating one more WordCount example under example folder
>> [2].
>> In particular, in this example we show variants of a) sideinputs as a
>> scalar AND multimap, b) from pipeline data or created within code and c)
>> [OPTIONAL?] Streaming versus batch, if there are differences (this I am not
>> sure yet).
>>
>> In the meanwhile, JIRA-2928 can also easily rely on such an example to
>> validate behaviors between portable/non-portable runners.
>>
>> Would like to double check if is this a reasonable idea.
>>
>> Even though SideInput is just one of our many many features, my
>> justification is that, it is commonly used, thus having a one-stop example
>> make it easier for new users.  That being said, is there a reason not to
>> have yet another WordCount example? (Another idea is to extend existing
>> WordCount.java, but that breaks its simplicity.)
>>
>> If it is a good change to have, any suggestion on what else to include?
>>
>> Thanks!
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-2928
>> [2]
>> sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L160
>> [3]
>> sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/MultimapSideInputTest.java
>> [4] examples/java/src/main/java/org/apache/beam/examples
>>
>> --
>> 
>> Ruoyun  Huang
>>
>>

-- 

Ruoyun  Huang


To create a WordCount-SideInput.java example?

2018-11-21 Thread Ruoyun Huang
Hi,

I am working on sideInput support in java reference runner (ULR) JIRA-2928
[1].
Although there are inline code snippet example [2] and unit tests [3], I
did not find
a good place showing a working example of SideInput(please correct me if I
am wrong).
I am thinking of creating one more WordCount example under example folder
[2].
In particular, in this example we show variants of a) sideinputs as a
scalar AND multimap, b) from pipeline data or created within code and c)
[OPTIONAL?] Streaming versus batch, if there are differences (this I am not
sure yet).

In the meanwhile, JIRA-2928 can also easily rely on such an example to
validate behaviors between portable/non-portable runners.

Would like to double check if is this a reasonable idea.

Even though SideInput is just one of our many many features, my
justification is that, it is commonly used, thus having a one-stop example
make it easier for new users.  That being said, is there a reason not to
have yet another WordCount example? (Another idea is to extend existing
WordCount.java, but that breaks its simplicity.)

If it is a good change to have, any suggestion on what else to include?

Thanks!

[1] https://issues.apache.org/jira/browse/BEAM-2928
[2]
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L160
[3]
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/MultimapSideInputTest.java
[4] examples/java/src/main/java/org/apache/beam/examples

-- 

Ruoyun  Huang


Re: [Testing] Splitting pre-commits from post-commit test targets

2018-11-20 Thread Ruoyun Huang
+1 Great improvement!  Thanks Scott!

On Tue, Nov 20, 2018 at 10:33 AM Pablo Estrada  wrote:

> I think this is a great idea, and a good improvement. Thanks Scott!
> -P.
>
> On Tue, Nov 20, 2018 at 10:09 AM Scott Wegner  wrote:
>
>> I wanted to give a heads-up to a small optimization that I hope to make
>> to our Jenkins test targets. Currently our post-commit test jobs also
>> redundantly run pre-commit tests. I'd like to remove redundant execution to
>> get a faster post-commit test signal. See:
>> https://github.com/apache/beam/pull/7073
>>
>> In Jenkins we run pre-commits separately from post-commits, and in all
>> cases when a language post-commit suite runs, the pre-commit is also run in
>> a separate job (in PR, on merge, cron schedule). So, it makes sense to
>> separate the targets. This will free up resources and give a faster signal
>> on post-commit suites as they are doing less work.
>>
>> From a quick test, this shaves 27 mins off of Python post-commits, 10
>> mins from Java, and ~1 minute from Go.
>>
>> The only negative impact I could imagine is if during local development
>> you were running `./gradlew :langPostCommit` as a shortcut to run all
>> tests. Now, in order to also run tests from pre-commit, you'd need to
>> specify it separately: `./gradlew :langPreCommit :langPostCommit`
>>
>> Got feedback? tinyurl.com/swegner-feedback
>>
>

-- 

Ruoyun  Huang


Re: Portable wordcount on Flink runner broken

2018-11-19 Thread Ruoyun Huang
Unfortunately, flink server still doesn't work consistently on my machine
yet.  Funny thing is, it did worked ONCE (
:beam-sdks-python:portableWordCount BUILD successful, finished in 18s).
When I tried gain, things were back to hanging with server printing
messages like:

"""
[flink-akka.actor.default-dispatcher-25] DEBUG
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received
slot report from instance 1ad9060bcc87cf5fd19c9a233c15a18f.
[flink-akka.actor.default-dispatcher-25] DEBUG
org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.
[flink-akka.actor.default-dispatcher-23] DEBUG
org.apache.flink.runtime.taskexecutor.TaskExecutor - Received heartbeat
request from 006b3653dc7a24471c115d70c4c55fa6.
[flink-akka.actor.default-dispatcher-25] DEBUG
org.apache.flink.runtime.jobmaster.JobMaster - Received heartbeat from
e188c32c-cfa5-4b85-bda9-16ce4742c490.
...
repeat above forever after 5 minutes.
"""

I am trying to figure out what I did right for that one time succeeded run.


For the step 3 Thomas mentioned, all I did for cleanup is "gradle clean",
if there are actually more to do, please kindly let me know.




On Mon, Nov 19, 2018 at 6:00 AM Maximilian Michels  wrote:

> Thanks for investing, Thomas!
>
> Ruoyun, does that solve the WordCount problem you were experiencing?
>
> -Max
>
> On 19.11.18 04:53, Thomas Weise wrote:
> > With latest master the problem seems fixed. Unfortunately that was first
> > masked by build and docker issues. But I changed multiple things at once
> > after getting nowhere (the container build "succeeded" when in fact it
> > did not):
> >
> > * Update to latest docker
> > * Increase docker disk space after seeing a spurious, non-reproducible
> > message in one of the build attempts
> > * Full clean and manually remove Go build residuals from the workspace
> >
> > After that I could see Go and container builds execute differently
> > (longer build time) and the result certainly looks better..
> >
> > HTH,
> > Thomas
> >
> >
> >
> > On Sun, Nov 18, 2018 at 2:11 PM Ruoyun Huang  > <mailto:ruo...@google.com>> wrote:
> >
> > I was after the same issue (I was using reference runner job server,
> > but same error message), had some clue but no conclusion yet.
> >
> > By retaining the container instance, error message says "bad MD5"
> > (see the other thread [1] I asked in dev last week). My hypothesis,
> > based on the symptoms, is that the underlying container expects an
> > MD5 to validate staged files, but job request from python SDK does
> > not send file hash code.  Hope someone can confirm if that is the
> > case (I am still trying to understand how come dataflow does not
> > have such issue), and if so, the best way to fix it.
> >
> >
> > [1]
> >
> https://lists.apache.org/thread.html/b26560087ff88f142e26d66c8a5a9283558c8e55b5edd705b5e53c9c@%3Cdev.beam.apache.org%3E
> >
> > On Fri, Nov 16, 2018 at 7:06 PM Thomas Weise  > <mailto:t...@apache.org>> wrote:
> >
> > Since last few days, the steps under
> > https://beam.apache.org/roadmap/portability/#python-on-flink are
> > broken.
> >
> > The gradle task hangs because the job server isn't able to
> > launch the docker container.
> >
> > ./gradlew :beam-sdks-python:portableWordCount
> > -PjobEndpoint=localhost:8099
> >
> > [CHAIN MapPartition (MapPartition at
> >
>  36write/Write/WriteImpl/DoOnce/Impulse.None/beam:env:docker:v1:0) ->
> > FlatMap (FlatMap at
> >
>  36write/Write/WriteImpl/DoOnce/Impulse.None/beam:env:docker:v1:0/out.0)
> > (8/8)] INFO
> >
>  org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory
> > - Still waiting for startup of environment
> >     tweise-docker-apache.bintray.io/beam/python:latest
> > <http://tweise-docker-apache.bintray.io/beam/python:latest> for
> > worker id 1
> >
> > Unfortunately this isn't covered by tests yet. Is anyone aware
> > what change may have caused this or looking into resolving it?
> >
> > Thanks,
> > Thomas
> >
> >
> >
> > --
> > 
> > Ruoyun  Huang
> >
>


-- 

Ruoyun  Huang


Re: Need help regarding memory leak issue

2018-11-16 Thread Ruoyun Huang
Even tough the algorithm works on your batch system, did you verify
anything that can rule out the possibility where it is the underlying ML
package causing the memory leak?

If not, maybe replace your prediction with a dummy function which does not
load any model at all, and always just give the same prediction. Then do
the same plotting, let us see what it looks like. And a plus with version
two: still a dummy prediction, but with model loaded.Given we don't
have much clue at this stage, at least this probably can give us more
confidence in whether it is the underlying ML package causing the issue, or
from beam sdk. just my 2 cents.


On Thu, Nov 15, 2018 at 4:54 PM Rakesh Kumar  wrote:

> Thanks for responding Ruoyun,
>
> We are not sure yet who is causing the leak, but once we run out of the
> memory then sdk worker crashes and pipeline is forced to restart. Check the
> memory usage patterns in the attached image. Each line in that graph is
> representing one task manager host.
>  You are right we are running the models for predictions.
>
> Here are few observations:
>
> 1. All the tasks manager memory usage climb over time but some of the task
> managers' memory climb really fast because they are running the ML models.
> These models are definitely using memory intensive data structure (pandas
> data frame etc) hence their memory usage climb really fast.
> 2. We had almost the same code running in different infrastructure
> (non-streaming) that doesn't cause any memory issue.
> 3. Even when the pipeline has restarted, the memory is not released. It is
> still hogged by something. You can notice in the attached image that
> pipeline restarted around 13:30. At that time it is definitely released
> some portion of the memory but didn't completely released all memory.
> Notice that, when the pipeline was originally started, it started with 30%
> of the memory but when got restarted by the job manager it started with 60%
> of the memory.
>
>
>
> On Thu, Nov 15, 2018 at 3:31 PM Ruoyun Huang  wrote:
>
>> trying to understand the situation you are having.
>>
>> By saying 'kills the appllication', is that a leak in the application
>> itself, or the workers being the root cause?  Also are you running ML
>> models inside Python SDK DoFn's?  Then I suppose it is running some
>> predictions rather than model training?
>>
>> On Thu, Nov 15, 2018 at 1:08 PM Rakesh Kumar 
>> wrote:
>>
>>> I am using *Beam Python SDK *to run my app in production. The app is
>>> running machine learning models. I am noticing some memory leak which
>>> eventually kills the application. I am not sure the source of memory leak.
>>> Currently, I am using object graph
>>> <https://mg.pov.lt/objgraph/#memory-leak-example> to dump the memory
>>> stats. I hope I will get some useful information out of this. I have also
>>> looked into Guppy library <https://pypi.org/project/guppy/> and they
>>> are almost the same.
>>>
>>> Do you guys have any recommendation for debugging this issue? Do we have
>>> any tooling in the SDK that can help to debug it?
>>> Please feel free to share your experience if you have debugged similar
>>> issues in past.
>>>
>>> Thank you,
>>> Rakesh
>>>
>>
>>
>> --
>> 
>> Ruoyun  Huang
>>
>>

-- 

Ruoyun  Huang


Re: Need help regarding memory leak issue

2018-11-15 Thread Ruoyun Huang
trying to understand the situation you are having.

By saying 'kills the appllication', is that a leak in the application
itself, or the workers being the root cause?  Also are you running ML
models inside Python SDK DoFn's?  Then I suppose it is running some
predictions rather than model training?

On Thu, Nov 15, 2018 at 1:08 PM Rakesh Kumar  wrote:

> I am using *Beam Python SDK *to run my app in production. The app is
> running machine learning models. I am noticing some memory leak which
> eventually kills the application. I am not sure the source of memory leak.
> Currently, I am using object graph
> <https://mg.pov.lt/objgraph/#memory-leak-example> to dump the memory
> stats. I hope I will get some useful information out of this. I have also
> looked into Guppy library <https://pypi.org/project/guppy/> and they are
> almost the same.
>
> Do you guys have any recommendation for debugging this issue? Do we have
> any tooling in the SDK that can help to debug it?
> Please feel free to share your experience if you have debugged similar
> issues in past.
>
> Thank you,
> Rakesh
>


-- 

Ruoyun  Huang


Re: How to use "PortableRunner" in Python SDK?

2018-11-14 Thread Ruoyun Huang
To answer Maximilian's question.

I am using Linux, debian distribution.

It probably sounded too much when I used the word 'planned merge'. What I
really meant entails less change than it sounds. More specifically:

1) The default behavior, where PortableRunner starts a flink server.  It is
confusing to new users.
2) All the related docs and inline comments.  Similarly, it could be very
confusing connecting PortableRunner to Flink server.
3) [Probably no longer an issue].   I couldn't make the flink server
example working.  And I could not make example working on Java-ULR either.
Both will require debugging for resolutions.  Thus I figured maybe let us
only focus on one single thing: the java-ULR part, without worrying about
Flink-server.   Again, looks like this may not be a valid concern, given
flink part is most likely due to my setup.


On Wed, Nov 14, 2018 at 3:30 AM Maximilian Michels  wrote:

> Hi Ruoyun,
>
> I just ran the wordcount locally using the instructions on the page.
> I've tried the local file system and GCS. Both times it ran successfully
> and produced valid output.
>
> I'm assuming there is some problem with your setup. Which platform are
> you using? I'm on MacOS.
>
> Could you expand on the planned merge? From my understanding we will
> always need PortableRunner in Python to be able to submit against the
> Beam JobServer.
>
> Thanks,
> Max
>
> On 14.11.18 00:39, Ruoyun Huang wrote:
> > A quick follow-up on using current PortableRunner.
> >
> > I followed the exact three steps as Ankur and Maximilian shared in
> > https://beam.apache.org/roadmap/portability/#python-on-flink  ;   The
> > wordcount example keeps hanging after 10 minutes.  I also tried
> > specifying explicit input/output args, either using gcs folder or local
> > file system, but none of them works.
> >
> > Spent some time looking into it but conclusion yet.  At this point
> > though, I guess it does not matter much any more, given we already have
> > the plan of merging PortableRunner into using java reference runner
> > (i.e. :beam-runners-reference-job-server).
> >
> > Still appreciated if someone can try out the python-on-flink
> > <https://beam.apache.org/roadmap/portability/#python-on-flink>instructions
>
> > in case it is just due to my local machine setup.  Thanks!
> >
> >
> >
> > On Thu, Nov 8, 2018 at 5:04 PM Ruoyun Huang  > <mailto:ruo...@google.com>> wrote:
> >
> > Thanks Maximilian!
> >
> > I am working on migrating existing PortableRunner to using java ULR
> > (Link to Notes
> > <
> https://docs.google.com/document/d/1S86saZqiDaE_M5wxO0zOQ_rwC6QHv7sp1BmGTm0dLNE/edit#
> >).
> > If this issue is non-trivial to solve, I would vote for removing
> > this default behavior as part of the consolidation.
> >
> > On Thu, Nov 8, 2018 at 2:58 AM Maximilian Michels  > <mailto:m...@apache.org>> wrote:
> >
> > In the long run, we should get rid of the Docker-inside-Docker
> > approach,
> > which was only intended for testing anyways. It would be cleaner
> to
> > start the SDK harness container alongside with JobServer
> container.
> >
> > Short term, I think it should be easy to either fix the
> > permissions of
> > the mounted "docker" executable or use a Docker image for the
> > JobServer
> > which comes with Docker pre-installed.
> >
> > JIRA: https://issues.apache.org/jira/browse/BEAM-6020
> >
> > Thanks for reporting this Ruoyun!
> >
> > -Max
> >
> > On 08.11.18 00:10, Ruoyun Huang wrote:
> >  > Thanks Ankur and Maximilian.
> >  >
> >  > Just for reference in case other people encountering the same
> > error
> >  > message, the "permission denied" error in my original email
> > is exactly
> >  > due to dockerinsidedocker issue that Ankur mentioned.
> > Thanks Ankur!
> >  > Didn't make the link when you said it, had to discover that
> > in a hard
> >  > way (I thought it is due to my docker installation messed up).
> >  >
> >  > On Tue, Nov 6, 2018 at 1:53 AM Maximilian Michels
> > mailto:m...@apache.org>
> >  > <mailto:m...@apache.org <mailto:m...@apache.org>>> wrote:
> >  >
> >  > Hi,
> >  >
> >  > Plea

Re: How to use "PortableRunner" in Python SDK?

2018-11-14 Thread Ruoyun Huang
Thanks Thomas!

My desktop runs Linux.  I was using gradle to run wordcount, and that was
how I got the job hanging. Since both of you get it working, I guess more
likely sth is wrong with my setup.


By using Thmoas's python command line exactly as is, I am able to see the
job run succeeds, however two questions:

1)  Did you check whether output file "/tmp/py-wordcount-direct" exists or
not?  I expect there should be a text output, but I don't see this file
afterwards.   (I am still in the stage building confidence in telling what
a succeeded run is.  Maybe I will try DataflowRunner and cross check
outputs).

2)  Why it needs a "--streaming" arg?  Isn't this a static batch input, by
feeding a txt file input?  In fact, I got failure message if I remove
'--streaming', not sure if it is due to my setup again.


On Wed, Nov 14, 2018 at 7:51 AM Thomas Weise  wrote:

> Works for me on macOS as well.
>
> In case you don't launch the pipeline through Gradle, this would be the
> command:
>
> python -m apache_beam.examples.wordcount \
>   --input=/etc/profile \
>   --output=/tmp/py-wordcount-direct \
>   --runner=PortableRunner \
>   --job_endpoint=localhost:8099 \
>   --parallelism=1 \
>   --OPTIONALflink_master=localhost:8081 \
>   --streaming
>
> We talked about adding the wordcount to pre-commit..
>
> Regarding using ULR vs. Flink runner: There seems to be confusion between
> PortableRunner using the user supplied endpoint vs. trying to launch a job
> server. I commented in the doc.
>
> Thomas
>
>
>
> On Wed, Nov 14, 2018 at 3:30 AM Maximilian Michels  wrote:
>
>> Hi Ruoyun,
>>
>> I just ran the wordcount locally using the instructions on the page.
>> I've tried the local file system and GCS. Both times it ran successfully
>> and produced valid output.
>>
>> I'm assuming there is some problem with your setup. Which platform are
>> you using? I'm on MacOS.
>>
>> Could you expand on the planned merge? From my understanding we will
>> always need PortableRunner in Python to be able to submit against the
>> Beam JobServer.
>>
>> Thanks,
>> Max
>>
>> On 14.11.18 00:39, Ruoyun Huang wrote:
>> > A quick follow-up on using current PortableRunner.
>> >
>> > I followed the exact three steps as Ankur and Maximilian shared in
>> > https://beam.apache.org/roadmap/portability/#python-on-flink  ;   The
>> > wordcount example keeps hanging after 10 minutes.  I also tried
>> > specifying explicit input/output args, either using gcs folder or local
>> > file system, but none of them works.
>> >
>> > Spent some time looking into it but conclusion yet.  At this point
>> > though, I guess it does not matter much any more, given we already have
>> > the plan of merging PortableRunner into using java reference runner
>> > (i.e. :beam-runners-reference-job-server).
>> >
>> > Still appreciated if someone can try out the python-on-flink
>> > <https://beam.apache.org/roadmap/portability/#python-on-flink>instructions
>>
>> > in case it is just due to my local machine setup.  Thanks!
>> >
>> >
>> >
>> > On Thu, Nov 8, 2018 at 5:04 PM Ruoyun Huang > > <mailto:ruo...@google.com>> wrote:
>> >
>> > Thanks Maximilian!
>> >
>> > I am working on migrating existing PortableRunner to using java ULR
>> > (Link to Notes
>> > <
>> https://docs.google.com/document/d/1S86saZqiDaE_M5wxO0zOQ_rwC6QHv7sp1BmGTm0dLNE/edit#
>> >).
>> > If this issue is non-trivial to solve, I would vote for removing
>> > this default behavior as part of the consolidation.
>> >
>> > On Thu, Nov 8, 2018 at 2:58 AM Maximilian Michels > > <mailto:m...@apache.org>> wrote:
>> >
>> > In the long run, we should get rid of the Docker-inside-Docker
>> > approach,
>> > which was only intended for testing anyways. It would be
>> cleaner to
>> > start the SDK harness container alongside with JobServer
>> container.
>> >
>> > Short term, I think it should be easy to either fix the
>> > permissions of
>> > the mounted "docker" executable or use a Docker image for the
>> > JobServer
>> > which comes with Docker pre-installed.
>> >
>> > JIRA: https://issues.apache.org/jira/browse/BEAM-6020
>> >
>> > Thanks for reporting this Ruoyun!
>> >
>> > -Max
>&

Re: How to use "PortableRunner" in Python SDK?

2018-11-13 Thread Ruoyun Huang
A quick follow-up on using current PortableRunner.

I followed the exact three steps as Ankur and Maximilian shared in
https://beam.apache.org/roadmap/portability/#python-on-flink  ;   The
wordcount example keeps hanging after 10 minutes.  I also tried specifying
explicit input/output args, either using gcs folder or local file system,
but none of them works.

Spent some time looking into it but conclusion yet.  At this point though,
I guess it does not matter much any more, given we already have the plan of
merging PortableRunner into using java reference runner (i.e.
:beam-runners-reference-job-server).

Still appreciated if someone can try out the python-on-flink
<https://beam.apache.org/roadmap/portability/#python-on-flink>instructions
in case it is just due to my local machine setup.  Thanks!



On Thu, Nov 8, 2018 at 5:04 PM Ruoyun Huang  wrote:

> Thanks Maximilian!
>
> I am working on migrating existing PortableRunner to using java ULR (Link
> to Notes
> <https://docs.google.com/document/d/1S86saZqiDaE_M5wxO0zOQ_rwC6QHv7sp1BmGTm0dLNE/edit#>).
> If this issue is non-trivial to solve, I would vote for removing this
> default behavior as part of the consolidation.
>
> On Thu, Nov 8, 2018 at 2:58 AM Maximilian Michels  wrote:
>
>> In the long run, we should get rid of the Docker-inside-Docker approach,
>> which was only intended for testing anyways. It would be cleaner to
>> start the SDK harness container alongside with JobServer container.
>>
>> Short term, I think it should be easy to either fix the permissions of
>> the mounted "docker" executable or use a Docker image for the JobServer
>> which comes with Docker pre-installed.
>>
>> JIRA: https://issues.apache.org/jira/browse/BEAM-6020
>>
>> Thanks for reporting this Ruoyun!
>>
>> -Max
>>
>> On 08.11.18 00:10, Ruoyun Huang wrote:
>> > Thanks Ankur and Maximilian.
>> >
>> > Just for reference in case other people encountering the same error
>> > message, the "permission denied" error in my original email is exactly
>> > due to dockerinsidedocker issue that Ankur mentioned.  Thanks
>> Ankur!
>> > Didn't make the link when you said it, had to discover that in a hard
>> > way (I thought it is due to my docker installation messed up).
>> >
>> > On Tue, Nov 6, 2018 at 1:53 AM Maximilian Michels > > <mailto:m...@apache.org>> wrote:
>> >
>> > Hi,
>> >
>> > Please follow
>> > https://beam.apache.org/roadmap/portability/#python-on-flink
>> >
>> > Cheers,
>> > Max
>> >
>> > On 06.11.18 01:14, Ankur Goenka wrote:
>> >  > Hi,
>> >  >
>> >  > The Portable Runner requires a job server uri to work with. The
>> > current
>> >  > default job server docker image is broken because of docker
>> inside
>> >  > docker issue.
>> >  >
>> >  > Please refer to
>> >  > https://beam.apache.org/roadmap/portability/#python-on-flink for
>> > how to
>> >  > run a wordcount using Portable Flink Runner.
>> >  >
>> >  > Thanks,
>> >  > Ankur
>> >  >
>> >  > On Mon, Nov 5, 2018 at 3:41 PM Ruoyun Huang > > <mailto:ruo...@google.com>
>> >  > <mailto:ruo...@google.com <mailto:ruo...@google.com>>> wrote:
>> >  >
>> >  > Hi, Folks,
>> >  >
>> >  >   I want to try out Python PortableRunner, by using
>> following
>> >  > command:
>> >  >
>> >  > *sdk/python: python -m apache_beam.examples.wordcount
>> >  >   --output=/tmp/test_output   --runner PortableRunner*
>> >  >
>> >  >   It complains with following error message:
>> >  >
>> >  > Caused by: java.lang.Exception: The user defined 'open()'
>> method
>> >  > caused an exception: java.io.IOException: Cannot run program
>> >  > "docker": error=13, Permission denied
>> >  > at
>> > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
>> >  > at
>> >  >
>> >
>>  org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>> >  > at
>> org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
>> >  > ... 1 more
>> >      >

Re: Spotless and lint precommit

2018-11-13 Thread Ruoyun Huang
+1

On Tue, Nov 13, 2018 at 8:29 AM Maximilian Michels  wrote:

> +1
>
> On 13.11.18 14:22, Robert Bradshaw wrote:
> > I really like how spottless runs separately and quickly for Java code.
> > Should we do the same for Python lint?
> >
>


-- 

Ruoyun  Huang


Re: How to use "PortableRunner" in Python SDK?

2018-11-08 Thread Ruoyun Huang
Thanks Maximilian!

I am working on migrating existing PortableRunner to using java ULR (Link
to Notes
<https://docs.google.com/document/d/1S86saZqiDaE_M5wxO0zOQ_rwC6QHv7sp1BmGTm0dLNE/edit#>).
If this issue is non-trivial to solve, I would vote for removing this
default behavior as part of the consolidation.

On Thu, Nov 8, 2018 at 2:58 AM Maximilian Michels  wrote:

> In the long run, we should get rid of the Docker-inside-Docker approach,
> which was only intended for testing anyways. It would be cleaner to
> start the SDK harness container alongside with JobServer container.
>
> Short term, I think it should be easy to either fix the permissions of
> the mounted "docker" executable or use a Docker image for the JobServer
> which comes with Docker pre-installed.
>
> JIRA: https://issues.apache.org/jira/browse/BEAM-6020
>
> Thanks for reporting this Ruoyun!
>
> -Max
>
> On 08.11.18 00:10, Ruoyun Huang wrote:
> > Thanks Ankur and Maximilian.
> >
> > Just for reference in case other people encountering the same error
> > message, the "permission denied" error in my original email is exactly
> > due to dockerinsidedocker issue that Ankur mentioned.  Thanks Ankur!
> > Didn't make the link when you said it, had to discover that in a hard
> > way (I thought it is due to my docker installation messed up).
> >
> > On Tue, Nov 6, 2018 at 1:53 AM Maximilian Michels  > <mailto:m...@apache.org>> wrote:
> >
> > Hi,
> >
> > Please follow
> > https://beam.apache.org/roadmap/portability/#python-on-flink
> >
> > Cheers,
> > Max
> >
> > On 06.11.18 01:14, Ankur Goenka wrote:
> >  > Hi,
> >  >
> >  > The Portable Runner requires a job server uri to work with. The
> > current
> >  > default job server docker image is broken because of docker inside
> >  > docker issue.
> >  >
> >  > Please refer to
> >  > https://beam.apache.org/roadmap/portability/#python-on-flink for
> > how to
> >  > run a wordcount using Portable Flink Runner.
> >  >
> >  > Thanks,
> >  > Ankur
> >  >
> >  > On Mon, Nov 5, 2018 at 3:41 PM Ruoyun Huang  > <mailto:ruo...@google.com>
> >  > <mailto:ruo...@google.com <mailto:ruo...@google.com>>> wrote:
> >  >
> >  > Hi, Folks,
> >  >
> >  >   I want to try out Python PortableRunner, by using
> following
> >  > command:
> >  >
> >  > *sdk/python: python -m apache_beam.examples.wordcount
> >  >   --output=/tmp/test_output   --runner PortableRunner*
> >  >
> >  >   It complains with following error message:
> >  >
> >  > Caused by: java.lang.Exception: The user defined 'open()'
> method
> >  > caused an exception: java.io.IOException: Cannot run program
> >  > "docker": error=13, Permission denied
> >  > at
> > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
> >  > at
> >  >
> >
>  org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
> >  > at
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
> >  > ... 1 more
> >  > Caused by:
> >  >
> >
>  
> org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.util.concurrent.UncheckedExecutionException:
> >  > java.io.IOException: Cannot run program "docker": error=13,
> >  > Permission denied
> >      > at
> >  >
> >
>  
> org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4994)
> >  >
> >  > ... 7 more
> >  >
> >  >
> >  >
> >  > My py2 environment is properly configured, because
> DirectRunner
> >  > works.  Also I tested my docker installation by 'docker run
> >  > hello-world ', no issue.
> >  >
> >  >
> >  > Thanks.
> >  > --
> >  > 
> >  > Ruoyun  Huang
> >  >
> >
> >
> >
> > --
> > 
> > Ruoyun  Huang
> >
>


-- 

Ruoyun  Huang


Re: How to use "PortableRunner" in Python SDK?

2018-11-07 Thread Ruoyun Huang
Thanks Ankur and Maximilian.

Just for reference in case other people encountering the same error
message, the "permission denied" error in my original email is exactly due
to docker inside docker issue that Ankur mentioned.  Thanks Ankur!
Didn't make the link when you said it, had to discover that in a hard way
(I thought it is due to my docker installation messed up).

On Tue, Nov 6, 2018 at 1:53 AM Maximilian Michels  wrote:

> Hi,
>
> Please follow https://beam.apache.org/roadmap/portability/#python-on-flink
>
> Cheers,
> Max
>
> On 06.11.18 01:14, Ankur Goenka wrote:
> > Hi,
> >
> > The Portable Runner requires a job server uri to work with. The current
> > default job server docker image is broken because of docker inside
> > docker issue.
> >
> > Please refer to
> > https://beam.apache.org/roadmap/portability/#python-on-flink for how to
> > run a wordcount using Portable Flink Runner.
> >
> > Thanks,
> > Ankur
> >
> > On Mon, Nov 5, 2018 at 3:41 PM Ruoyun Huang  > <mailto:ruo...@google.com>> wrote:
> >
> > Hi, Folks,
> >
> >   I want to try out Python PortableRunner, by using following
> > command:
> >
> > *sdk/python: python -m apache_beam.examples.wordcount
> >   --output=/tmp/test_output   --runner PortableRunner*
> >
> >   It complains with following error message:
> >
> > Caused by: java.lang.Exception: The user defined 'open()' method
> > caused an exception: java.io.IOException: Cannot run program
> > "docker": error=13, Permission denied
> > at
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
> > at
> >
>  org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
> > ... 1 more
> > Caused by:
> >
>  
> org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.util.concurrent.UncheckedExecutionException:
> > java.io.IOException: Cannot run program "docker": error=13,
> > Permission denied
> > at
> >
>  
> org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4994)
> >
> > ... 7 more
> >
> >
> >
> > My py2 environment is properly configured, because DirectRunner
> > works.  Also I tested my docker installation by 'docker run
> > hello-world ', no issue.
> >
> >
> > Thanks.
> > --
> > 
> > Ruoyun  Huang
> >
>


-- 

Ruoyun  Huang


How to use "PortableRunner" in Python SDK?

2018-11-05 Thread Ruoyun Huang
Hi, Folks,

 I want to try out Python PortableRunner, by using following command:

*sdk/python: python -m apache_beam.examples.wordcount
 --output=/tmp/test_output   --runner PortableRunner*

 It complains with following error message:

Caused by: java.lang.Exception: The user defined 'open()' method caused an
exception: java.io.IOException: Cannot run program "docker": error=13,
Permission denied
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
... 1 more
Caused by:
org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.util.concurrent.UncheckedExecutionException:
java.io.IOException: Cannot run program "docker": error=13, Permission
denied
at
org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4994)

... 7 more



My py2 environment is properly configured, because DirectRunner works.
Also I tested my docker installation by 'docker run hello-world ', no
issue.


Thanks.
-- 
========
Ruoyun  Huang


Suggestions on BEAM-5931 (to update PerformanceTest_TextIO)

2018-11-02 Thread Ruoyun Huang
Hi, Folks,

I am working on fixes for BEAM-5931
<https://issues.apache.org/jira/projects/BEAM/issues/BEAM-5931>. Two
Jenkins tests were affected. One is Nexmark, for which I’ve fixed in PR#6916
<https://github.com/apache/beam/pull/6916>.  The other one is
PerformanceTest_TextIO
<https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy>,
where we’d like to apply similar changes, and this is the place I have
trouble finding the best fix.

Would like to have suggestions before spending too much time trying things
unnecessarily creative, while there are easier alternatives available.
I’ve also created a draft PR#6921
<https://github.com/apache/beam/pull/6921/files> showing what I am about to
do.

The main issue is that in PerformanceTest_TextIO
<https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy>
jenkins job uses shell to call python, to run another github project, based
on which then run performance test related classes.

Question#1: We will have to do a gradle shadowJar build, but how to pass
the path string (i.e. project(“”).shadowJar.ArchivePath) between jenkins
and gradle?  Either way (jenkins -> gradle or gradle -> jenkins) in theory
should work, but which one would be more easier? One idea is to use
environment variables, but so far I have trouble making environment
variable work across gradle task and shell task.

Question#2: Is CommonJobProperties.buildPerformanceTest() the right place
to update at all? It would be much much easier if we can just update other
places (instead of jenkins job).

Any suggestions appreciated and feel free to just comment on the draft
PR#6921 <https://github.com/apache/beam/pull/6921/files>.

Thanks!

-- 

Ruoyun  Huang


Re: :beam-sdks-java-io-hadoop-input-format:test task issues

2018-10-31 Thread Ruoyun Huang
+1 to have inputs regarding this failure Alex raised.

javaPreCommit never worked on my local machine in past a few weeks.  This
hadoop build target has been the main issue.

On Tue, Oct 30, 2018 at 5:31 PM Alex Amato  wrote:

> Hello,
>
> I keep encountering issues with the precommit process, and this particular
> test seems to keep failing.
>
> :beam-sdks-java-io-hadoop-input-format:test
>
> Sometimes it fails with SIGSEGV errors in the java compiler (attached log).
>
> I disabled the gradle daemon, which got me past some OOMing issues. I was
> wondering if there is some other steps that I need to do in order to get a
> more consistent testing experience. Any other advice?
>
> I also had a very different failure another run. build_scan
> <https://scans.gradle.com/s/iw6twrkb7llmu/failure?openFailures=WzBd=WzEse31d#top=0>
> link. This was more vague, mentioning just a possible "test process
> configuration" issue.
>
> *./gradlew --version*
> 
> Gradle 4.10.2
> 
>
> Build time:   2018-09-19 18:10:15 UTC
> Revision: b4d8d5d170bb4ba516e88d7fe5647e2323d791dd
>
> Kotlin DSL:   1.0-rc-6
> Kotlin:   1.2.61
> Groovy:   2.4.15
> Ant:  Apache Ant(TM) version 1.9.11 compiled on March 23 2018
> JVM:  1.8.0_161-google-v7 (Oracle Corporation 25.161-b01)
> OS:   Linux 4.17.0-3rodete2-amd64 amd64
>
>
> Thanks for taking a look,
> Alex
>
>

-- 

Ruoyun  Huang


Re: Java Precommit duration

2018-10-25 Thread Ruoyun Huang
runners-google-cloud-dataflow-java-examples:preCommit
>>>>>> * 4m
>>>>>> :beam-runners-google-cloud-dataflow-java-examples-streaming:preCommit
>>>>>> These are integration tests that should have their own job & status
>>>>>> anyhow. We lumped them in because Maven can't do separate tests. Gradle
>>>>>> makes this cheap and easy.
>>>>>>
>>>>>> Then there are these which are the only other tasks over 1m:
>>>>>>
>>>>>> * 2m :beam-runners-google-cloud-dataflow-java-legacy-worker:test
>>>>>> * 2m :beam-runners-google-cloud-dataflow-java-fn-api-worker:test
>>>>>> * 2m :beam-sdks-java-nexmark:test
>>>>>> * 1m :beam-sdks-java-io-google-cloud-platform:test
>>>>>> * 1m :beam-sdks-java-io-hbase:test
>>>>>> * 1m :beam-sdks-java-extensions-sql:test
>>>>>>
>>>>>> Maybe not worth messing with these.  Also if we remove all the
>>>>>> shadowJar and shadowTestJar tasks it actually looks like it would only 
>>>>>> save
>>>>>> 5 minutes, so I was hasty in thinking that would solve things. It will 
>>>>>> make
>>>>>> interactive work better (going from 30s to maybe <10s for rebuilds) but
>>>>>> won't help that much for Jenkins.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Got feedback? tinyurl.com/swegner-feedback
>>>
>>

-- 

Ruoyun  Huang


Re: New Edit button on beam.apache.org pages

2018-10-24 Thread Ruoyun Huang
Looks awesome!

On Wed, Oct 24, 2018 at 2:24 PM Alan Myrvold  wrote:

> To make small documentation changes easier, there is now an Edit button at
> the top right of the pages on https://beam.apache.org. This button opens
> the source .md file on the master branch of the beam repository in the
> github web editor. After making changes you can create a pull request to
> ask to have it merged.
>
> Thanks to Scott for the suggestion to add this in [BEAM-4431]
> <https://issues.apache.org/jira/browse/BEAM-4431>
>
> Let me know if you run into any issues.
>
> Alan
>
>
>

-- 

Ruoyun  Huang


Re: Python docs build error

2018-10-22 Thread Ruoyun Huang
To Colm's question.

We observed this issue as well and had discussions in a separate thread
<https://issues.apache.org/jira/browse/BEAM-5793>, with Scott and Micah.

This issue was only reproduced on certain Linux environment.  MacOS does
not have this error.  We also specifically ran the test on Jenkins, but
could not reproduce it either.


On Mon, Oct 22, 2018 at 7:49 AM Colm O hEigeartaigh 
wrote:

> Great, thanks! Out of curiosity, did the jenkins job for the initial PR
> not detect the build failure?
>
> Colm.
>
> On Mon, Oct 22, 2018 at 2:29 PM Maximilian Michels  wrote:
>
>> Correction for the footnote:
>>
>> [1] https://github.com/apache/beam/pull/6637
>>
>> On 22.10.18 15:24, Maximilian Michels wrote:
>> > Hi Colm,
>> >
>> > This [1] got merged recently and broke the "docs" target which
>> > apparently is not part of our Python PreCommit tests.
>> >
>> > See the following PR for a fix:
>> https://github.com/apache/beam/pull/6774
>> >
>> > Best,
>> > Max
>> >
>> > [1] https://github.com/apache/beam/pull/6737
>> >
>> > On 22.10.18 12:55, Colm O hEigeartaigh wrote:
>> >> Hi all,
>> >>
>> >> The following command: ./gradlew :beam-sdks-python:docs gives me the
>> >> following error:
>> >>
>> >>
>> /home/coheig/src/apache/beam/sdks/python/apache_beam/io/flink/flink_streaming_impulse_source.py:docstring
>>
>> >> of
>> >>
>> apache_beam.io.flink.flink_streaming_impulse_source.FlinkStreamingImpulseSource.from_runner_api_parameter:11:
>>
>> >> WARNING: Unexpected indentation.
>> >> Command exited with non-zero status 1
>> >> 42.81user 4.02system 0:16.27elapsed 287%CPU (0avgtext+0avgdata
>> >> 141036maxresident)k
>> >> 0inputs+47792outputs (0major+727274minor)pagefaults 0swaps
>> >> ERROR: InvocationError for command '/usr/bin/time
>> >> /home/coheig/src/apache/beam/sdks/python/scripts/generate_pydoc.sh'
>> >> (exited with code 1)
>> >> ___ summary
>> >> 
>> >> ERROR:   docs: commands failed
>> >>
>> >>  > Task :beam-sdks-python:docs FAILED
>> >>
>> >> FAILURE: Build failed with an exception.
>> >>
>> >> Am I missing something or is there an issue here?
>> >>
>> >> Thanks,
>> >>
>> >> Colm.
>> >>
>> >>
>> >> --
>> >> Colm O hEigeartaigh
>> >>
>> >> Talend Community Coder
>> >> http://coders.talend.com
>>
>
>
> --
> Colm O hEigeartaigh
>
> Talend Community Coder
> http://coders.talend.com
>


-- 

Ruoyun  Huang


Re: Does anyone have a strong intelliJ setup?

2018-10-01 Thread Ruoyun Huang
Some fresh memory here.  I had the same issue on my first intellij
project.  In my second try I made sure "Create an empty IntelliJ project
outside of the Beam source tree." .  Now I can just click a testing targets
in gradle window and it runs.

On Mon, Oct 1, 2018 at 11:05 AM Alex Amato  wrote:

> Hello,
>
> I'm looking to get a good intellij setup working and then update the
> documentation how to build and test the java SDK with intelliJ.
>
> Does anyone have a good setup working, with some tips? I followed our
> instructions here, but I found that after following these steps I could not
> build or test the project. It seemed like the build button did nothing and
> the test buttons did not appear.
> https://beam.apache.org/contribute/intellij/
>
> I'm also curious about the gradle support for generating intelliJ
> projects. Has anyone tried this as well?
>
> Any tips would be appreciated.
> Thank you,
> Alex
>


-- 

Ruoyun  Huang


Re: Add ruoyun into contributor list?

2018-09-19 Thread Ruoyun Huang
Thanks Ismaël!  :-)

On Wed, Sep 19, 2018 at 1:29 PM Ismaël Mejía  wrote:

> Done
> On Wed, Sep 19, 2018 at 10:17 PM ruo...@google.com 
> wrote:
> >
> > Hi, Folks,
> >
> >  Can some one add me as contributor in the Beam issue tracker?
> >
> >  account name in JIRA:  ruoyun
> >
> > Thanks!
>


-- 

Ruoyun  Huang