Re: Deprecate some or all of TestPipelineOptions?

2019-11-08 Thread Brian Hulette
Thanks everyone for the responses. I put up a WIP PR [1] that removes
OnCreateMatcher and OnSuccessMatcher.

[1] https://github.com/apache/beam/pull/10056

On Fri, Nov 8, 2019 at 9:48 AM Luke Cwik  wrote:

> It can be marked as deprecated and we can remove its usage everywhere but
> leave this interface and mark it for removal at some future time.
>
> On Thu, Nov 7, 2019 at 2:23 PM Ismaël Mejía  wrote:
>
>> Thanks for bringing this to the ML Brian
>>
>> +1 For full TestPipelineOptions deprecation. Even worth to remove it,
>> bad part is that this class resides in 'sdks/core/main/java' and not
>> in testing as I imagined so this could count as a 'breaking' change.
>>
>> On Thu, Nov 7, 2019 at 8:27 PM Luke Cwik  wrote:
>> >
>> > There was issue with asynchrony of p.run(), some runners blocked till
>> the pipeline was complete with p.run() which was never meant to be the
>> intent.
>> >
>> > The test timeout one makes sense to be able to configure it per runner
>> (since Dataflow takes a lot longer than other runners) but we may be able
>> to configure a Junit test timeout attribute instead.
>> >
>> > I would be for getting rid of them.
>> >
>> >
>> > On Wed, Nov 6, 2019 at 3:36 PM Robert Bradshaw 
>> wrote:
>> >>
>> >> +1 to all of these are probably obsolete at this point and would be
>> >> nice to remove.
>> >>
>> >>
>> >> On Wed, Nov 6, 2019 at 3:00 PM Kenneth Knowles 
>> wrote:
>> >> >
>> >> > Good find. I think TestPipelineOptions is from very early days. It
>> makes sense to me that these are all obsolete. Some guesses, though I
>> haven't dug through commit history to confirm:
>> >> >
>> >> >  - TempRoot: a while ago TempLocation was optional, so I think this
>> would provide a default for things like gcpTempLocation and stagingLocation
>> >> >  - OnSuccessMatcher: for runners where pipeline used to not
>> terminate in streaming mode. Now I think every runner can successfully
>> waitUntilFinish. Also the current API for waitUntilFinish went through some
>> evolutions around asynchrony so it wasn't always a good choice.
>> >> >  - OnCreateMatcher: just for symmetry? I don't know
>> >> >  - TestTimeoutSeconds: probably also for the
>> asychrony/waitUntilfinish issue
>> >> >
>> >> > Kenn
>> >> >
>> >> > On Wed, Nov 6, 2019 at 12:19 PM Brian Hulette 
>> wrote:
>> >> >>
>> >> >> I recently came across TestPipelineOptions, and now I'm wondering
>> if maybe it should be deprecated. It only seems to actually be supported
>> for Spark and Dataflow (via TestSparkRunner and TestDataflowRunner), and I
>> think it may make more sense to move the functionality it provides into the
>> tests that need it.
>> >> >>
>> >> >> TestPipelineOptions currently has four attributes:
>> >> >>
>> >> >> # TempRoot
>> >> >> It's purpose isn't documented, but many tests read TempRoot and use
>> it to set a TempLocation (example). I think this attribute makes sense
>> (e.g. we can set TempRoot once and each test has its own subdirectory), but
>> I'm not sure. Can anyone confirm the motivation for it? I'd like to at
>> least add a docstring for it.
>> >> >>
>> >> >> # OnCreateMatcher
>> >> >> A way to register a matcher that will be checked right after a
>> pipeline has started. It's never set except for in TestDataflowRunnerTest,
>> so I think this is absolutely safe to remove.
>> >> >>
>> >> >> # OnSuccessMatcher
>> >> >> A way to register a matcher that will be checked right after a
>> pipeline has successfully completed. This is used in several tests
>> (RequiresStableInputIT, WordCountIT, ... 8 total occurrences), but I don't
>> see why they couldn't all be replaced with a `p.run().waitUntilFinish()`,
>> followed by an assert.
>> >> >>
>> >> >> I think the current approach is actually dangerous, because running
>> these tests with runners other than TestDataflowRunner or TestSparkRunner
>> means the matchers are never actually checked. This is actually how I came
>> across TestPipelineOptions - I tried running a test with the DirectRunner
>> and couldn't make it fail.
>> >> >>
>> >> >> # TestTimeoutSeconds
>> >> >> Seems to just be a wrapper for `waitUntilFinish(duration)`, and
>> only used in one place. I think it would be cleaner for the test to be
>> responsible for calling waitUntilFinish (which we do elsewhere), the only
>> drawback is it requires a small refactor so the test has access to the
>> PipelineResult object.
>> >> >>
>> >> >>
>> >> >> So I have a couple of questions for the community
>> >> >> 1) Are there thoughts on TempRoot? Can we get rid of it?
>> >> >> 2) Are there any objections to removing the other three attributes?
>> Am I missing something? Unless there are any objections I think I'll write
>> a patch to remove them.
>> >> >>
>> >> >> Thanks,
>> >> >> Brian
>>
>


Re: Deprecate some or all of TestPipelineOptions?

2019-11-08 Thread Luke Cwik
It can be marked as deprecated and we can remove its usage everywhere but
leave this interface and mark it for removal at some future time.

On Thu, Nov 7, 2019 at 2:23 PM Ismaël Mejía  wrote:

> Thanks for bringing this to the ML Brian
>
> +1 For full TestPipelineOptions deprecation. Even worth to remove it,
> bad part is that this class resides in 'sdks/core/main/java' and not
> in testing as I imagined so this could count as a 'breaking' change.
>
> On Thu, Nov 7, 2019 at 8:27 PM Luke Cwik  wrote:
> >
> > There was issue with asynchrony of p.run(), some runners blocked till
> the pipeline was complete with p.run() which was never meant to be the
> intent.
> >
> > The test timeout one makes sense to be able to configure it per runner
> (since Dataflow takes a lot longer than other runners) but we may be able
> to configure a Junit test timeout attribute instead.
> >
> > I would be for getting rid of them.
> >
> >
> > On Wed, Nov 6, 2019 at 3:36 PM Robert Bradshaw 
> wrote:
> >>
> >> +1 to all of these are probably obsolete at this point and would be
> >> nice to remove.
> >>
> >>
> >> On Wed, Nov 6, 2019 at 3:00 PM Kenneth Knowles  wrote:
> >> >
> >> > Good find. I think TestPipelineOptions is from very early days. It
> makes sense to me that these are all obsolete. Some guesses, though I
> haven't dug through commit history to confirm:
> >> >
> >> >  - TempRoot: a while ago TempLocation was optional, so I think this
> would provide a default for things like gcpTempLocation and stagingLocation
> >> >  - OnSuccessMatcher: for runners where pipeline used to not terminate
> in streaming mode. Now I think every runner can successfully
> waitUntilFinish. Also the current API for waitUntilFinish went through some
> evolutions around asynchrony so it wasn't always a good choice.
> >> >  - OnCreateMatcher: just for symmetry? I don't know
> >> >  - TestTimeoutSeconds: probably also for the
> asychrony/waitUntilfinish issue
> >> >
> >> > Kenn
> >> >
> >> > On Wed, Nov 6, 2019 at 12:19 PM Brian Hulette 
> wrote:
> >> >>
> >> >> I recently came across TestPipelineOptions, and now I'm wondering if
> maybe it should be deprecated. It only seems to actually be supported for
> Spark and Dataflow (via TestSparkRunner and TestDataflowRunner), and I
> think it may make more sense to move the functionality it provides into the
> tests that need it.
> >> >>
> >> >> TestPipelineOptions currently has four attributes:
> >> >>
> >> >> # TempRoot
> >> >> It's purpose isn't documented, but many tests read TempRoot and use
> it to set a TempLocation (example). I think this attribute makes sense
> (e.g. we can set TempRoot once and each test has its own subdirectory), but
> I'm not sure. Can anyone confirm the motivation for it? I'd like to at
> least add a docstring for it.
> >> >>
> >> >> # OnCreateMatcher
> >> >> A way to register a matcher that will be checked right after a
> pipeline has started. It's never set except for in TestDataflowRunnerTest,
> so I think this is absolutely safe to remove.
> >> >>
> >> >> # OnSuccessMatcher
> >> >> A way to register a matcher that will be checked right after a
> pipeline has successfully completed. This is used in several tests
> (RequiresStableInputIT, WordCountIT, ... 8 total occurrences), but I don't
> see why they couldn't all be replaced with a `p.run().waitUntilFinish()`,
> followed by an assert.
> >> >>
> >> >> I think the current approach is actually dangerous, because running
> these tests with runners other than TestDataflowRunner or TestSparkRunner
> means the matchers are never actually checked. This is actually how I came
> across TestPipelineOptions - I tried running a test with the DirectRunner
> and couldn't make it fail.
> >> >>
> >> >> # TestTimeoutSeconds
> >> >> Seems to just be a wrapper for `waitUntilFinish(duration)`, and only
> used in one place. I think it would be cleaner for the test to be
> responsible for calling waitUntilFinish (which we do elsewhere), the only
> drawback is it requires a small refactor so the test has access to the
> PipelineResult object.
> >> >>
> >> >>
> >> >> So I have a couple of questions for the community
> >> >> 1) Are there thoughts on TempRoot? Can we get rid of it?
> >> >> 2) Are there any objections to removing the other three attributes?
> Am I missing something? Unless there are any objections I think I'll write
> a patch to remove them.
> >> >>
> >> >> Thanks,
> >> >> Brian
>


Re: Deprecate some or all of TestPipelineOptions?

2019-11-07 Thread Ismaël Mejía
Thanks for bringing this to the ML Brian

+1 For full TestPipelineOptions deprecation. Even worth to remove it,
bad part is that this class resides in 'sdks/core/main/java' and not
in testing as I imagined so this could count as a 'breaking' change.

On Thu, Nov 7, 2019 at 8:27 PM Luke Cwik  wrote:
>
> There was issue with asynchrony of p.run(), some runners blocked till the 
> pipeline was complete with p.run() which was never meant to be the intent.
>
> The test timeout one makes sense to be able to configure it per runner (since 
> Dataflow takes a lot longer than other runners) but we may be able to 
> configure a Junit test timeout attribute instead.
>
> I would be for getting rid of them.
>
>
> On Wed, Nov 6, 2019 at 3:36 PM Robert Bradshaw  wrote:
>>
>> +1 to all of these are probably obsolete at this point and would be
>> nice to remove.
>>
>>
>> On Wed, Nov 6, 2019 at 3:00 PM Kenneth Knowles  wrote:
>> >
>> > Good find. I think TestPipelineOptions is from very early days. It makes 
>> > sense to me that these are all obsolete. Some guesses, though I haven't 
>> > dug through commit history to confirm:
>> >
>> >  - TempRoot: a while ago TempLocation was optional, so I think this would 
>> > provide a default for things like gcpTempLocation and stagingLocation
>> >  - OnSuccessMatcher: for runners where pipeline used to not terminate in 
>> > streaming mode. Now I think every runner can successfully waitUntilFinish. 
>> > Also the current API for waitUntilFinish went through some evolutions 
>> > around asynchrony so it wasn't always a good choice.
>> >  - OnCreateMatcher: just for symmetry? I don't know
>> >  - TestTimeoutSeconds: probably also for the asychrony/waitUntilfinish 
>> > issue
>> >
>> > Kenn
>> >
>> > On Wed, Nov 6, 2019 at 12:19 PM Brian Hulette  wrote:
>> >>
>> >> I recently came across TestPipelineOptions, and now I'm wondering if 
>> >> maybe it should be deprecated. It only seems to actually be supported for 
>> >> Spark and Dataflow (via TestSparkRunner and TestDataflowRunner), and I 
>> >> think it may make more sense to move the functionality it provides into 
>> >> the tests that need it.
>> >>
>> >> TestPipelineOptions currently has four attributes:
>> >>
>> >> # TempRoot
>> >> It's purpose isn't documented, but many tests read TempRoot and use it to 
>> >> set a TempLocation (example). I think this attribute makes sense (e.g. we 
>> >> can set TempRoot once and each test has its own subdirectory), but I'm 
>> >> not sure. Can anyone confirm the motivation for it? I'd like to at least 
>> >> add a docstring for it.
>> >>
>> >> # OnCreateMatcher
>> >> A way to register a matcher that will be checked right after a pipeline 
>> >> has started. It's never set except for in TestDataflowRunnerTest, so I 
>> >> think this is absolutely safe to remove.
>> >>
>> >> # OnSuccessMatcher
>> >> A way to register a matcher that will be checked right after a pipeline 
>> >> has successfully completed. This is used in several tests 
>> >> (RequiresStableInputIT, WordCountIT, ... 8 total occurrences), but I 
>> >> don't see why they couldn't all be replaced with a 
>> >> `p.run().waitUntilFinish()`, followed by an assert.
>> >>
>> >> I think the current approach is actually dangerous, because running these 
>> >> tests with runners other than TestDataflowRunner or TestSparkRunner means 
>> >> the matchers are never actually checked. This is actually how I came 
>> >> across TestPipelineOptions - I tried running a test with the DirectRunner 
>> >> and couldn't make it fail.
>> >>
>> >> # TestTimeoutSeconds
>> >> Seems to just be a wrapper for `waitUntilFinish(duration)`, and only used 
>> >> in one place. I think it would be cleaner for the test to be responsible 
>> >> for calling waitUntilFinish (which we do elsewhere), the only drawback is 
>> >> it requires a small refactor so the test has access to the PipelineResult 
>> >> object.
>> >>
>> >>
>> >> So I have a couple of questions for the community
>> >> 1) Are there thoughts on TempRoot? Can we get rid of it?
>> >> 2) Are there any objections to removing the other three attributes? Am I 
>> >> missing something? Unless there are any objections I think I'll write a 
>> >> patch to remove them.
>> >>
>> >> Thanks,
>> >> Brian


Re: Deprecate some or all of TestPipelineOptions?

2019-11-07 Thread Luke Cwik
There was issue with asynchrony of p.run(), some runners blocked till the
pipeline was complete with p.run() which was never meant to be the intent.

The test timeout one makes sense to be able to configure it per runner
(since Dataflow takes a lot longer than other runners) but we may be able
to configure a Junit test timeout attribute instead.

I would be for getting rid of them.


On Wed, Nov 6, 2019 at 3:36 PM Robert Bradshaw  wrote:

> +1 to all of these are probably obsolete at this point and would be
> nice to remove.
>
>
> On Wed, Nov 6, 2019 at 3:00 PM Kenneth Knowles  wrote:
> >
> > Good find. I think TestPipelineOptions is from very early days. It makes
> sense to me that these are all obsolete. Some guesses, though I haven't dug
> through commit history to confirm:
> >
> >  - TempRoot: a while ago TempLocation was optional, so I think this
> would provide a default for things like gcpTempLocation and stagingLocation
> >  - OnSuccessMatcher: for runners where pipeline used to not terminate in
> streaming mode. Now I think every runner can successfully waitUntilFinish.
> Also the current API for waitUntilFinish went through some evolutions
> around asynchrony so it wasn't always a good choice.
> >  - OnCreateMatcher: just for symmetry? I don't know
> >  - TestTimeoutSeconds: probably also for the asychrony/waitUntilfinish
> issue
> >
> > Kenn
> >
> > On Wed, Nov 6, 2019 at 12:19 PM Brian Hulette 
> wrote:
> >>
> >> I recently came across TestPipelineOptions, and now I'm wondering if
> maybe it should be deprecated. It only seems to actually be supported for
> Spark and Dataflow (via TestSparkRunner and TestDataflowRunner), and I
> think it may make more sense to move the functionality it provides into the
> tests that need it.
> >>
> >> TestPipelineOptions currently has four attributes:
> >>
> >> # TempRoot
> >> It's purpose isn't documented, but many tests read TempRoot and use it
> to set a TempLocation (example). I think this attribute makes sense (e.g.
> we can set TempRoot once and each test has its own subdirectory), but I'm
> not sure. Can anyone confirm the motivation for it? I'd like to at least
> add a docstring for it.
> >>
> >> # OnCreateMatcher
> >> A way to register a matcher that will be checked right after a pipeline
> has started. It's never set except for in TestDataflowRunnerTest, so I
> think this is absolutely safe to remove.
> >>
> >> # OnSuccessMatcher
> >> A way to register a matcher that will be checked right after a pipeline
> has successfully completed. This is used in several tests
> (RequiresStableInputIT, WordCountIT, ... 8 total occurrences), but I don't
> see why they couldn't all be replaced with a `p.run().waitUntilFinish()`,
> followed by an assert.
> >>
> >> I think the current approach is actually dangerous, because running
> these tests with runners other than TestDataflowRunner or TestSparkRunner
> means the matchers are never actually checked. This is actually how I came
> across TestPipelineOptions - I tried running a test with the DirectRunner
> and couldn't make it fail.
> >>
> >> # TestTimeoutSeconds
> >> Seems to just be a wrapper for `waitUntilFinish(duration)`, and only
> used in one place. I think it would be cleaner for the test to be
> responsible for calling waitUntilFinish (which we do elsewhere), the only
> drawback is it requires a small refactor so the test has access to the
> PipelineResult object.
> >>
> >>
> >> So I have a couple of questions for the community
> >> 1) Are there thoughts on TempRoot? Can we get rid of it?
> >> 2) Are there any objections to removing the other three attributes? Am
> I missing something? Unless there are any objections I think I'll write a
> patch to remove them.
> >>
> >> Thanks,
> >> Brian
>


Re: Deprecate some or all of TestPipelineOptions?

2019-11-06 Thread Robert Bradshaw
+1 to all of these are probably obsolete at this point and would be
nice to remove.


On Wed, Nov 6, 2019 at 3:00 PM Kenneth Knowles  wrote:
>
> Good find. I think TestPipelineOptions is from very early days. It makes 
> sense to me that these are all obsolete. Some guesses, though I haven't dug 
> through commit history to confirm:
>
>  - TempRoot: a while ago TempLocation was optional, so I think this would 
> provide a default for things like gcpTempLocation and stagingLocation
>  - OnSuccessMatcher: for runners where pipeline used to not terminate in 
> streaming mode. Now I think every runner can successfully waitUntilFinish. 
> Also the current API for waitUntilFinish went through some evolutions around 
> asynchrony so it wasn't always a good choice.
>  - OnCreateMatcher: just for symmetry? I don't know
>  - TestTimeoutSeconds: probably also for the asychrony/waitUntilfinish issue
>
> Kenn
>
> On Wed, Nov 6, 2019 at 12:19 PM Brian Hulette  wrote:
>>
>> I recently came across TestPipelineOptions, and now I'm wondering if maybe 
>> it should be deprecated. It only seems to actually be supported for Spark 
>> and Dataflow (via TestSparkRunner and TestDataflowRunner), and I think it 
>> may make more sense to move the functionality it provides into the tests 
>> that need it.
>>
>> TestPipelineOptions currently has four attributes:
>>
>> # TempRoot
>> It's purpose isn't documented, but many tests read TempRoot and use it to 
>> set a TempLocation (example). I think this attribute makes sense (e.g. we 
>> can set TempRoot once and each test has its own subdirectory), but I'm not 
>> sure. Can anyone confirm the motivation for it? I'd like to at least add a 
>> docstring for it.
>>
>> # OnCreateMatcher
>> A way to register a matcher that will be checked right after a pipeline has 
>> started. It's never set except for in TestDataflowRunnerTest, so I think 
>> this is absolutely safe to remove.
>>
>> # OnSuccessMatcher
>> A way to register a matcher that will be checked right after a pipeline has 
>> successfully completed. This is used in several tests 
>> (RequiresStableInputIT, WordCountIT, ... 8 total occurrences), but I don't 
>> see why they couldn't all be replaced with a `p.run().waitUntilFinish()`, 
>> followed by an assert.
>>
>> I think the current approach is actually dangerous, because running these 
>> tests with runners other than TestDataflowRunner or TestSparkRunner means 
>> the matchers are never actually checked. This is actually how I came across 
>> TestPipelineOptions - I tried running a test with the DirectRunner and 
>> couldn't make it fail.
>>
>> # TestTimeoutSeconds
>> Seems to just be a wrapper for `waitUntilFinish(duration)`, and only used in 
>> one place. I think it would be cleaner for the test to be responsible for 
>> calling waitUntilFinish (which we do elsewhere), the only drawback is it 
>> requires a small refactor so the test has access to the PipelineResult 
>> object.
>>
>>
>> So I have a couple of questions for the community
>> 1) Are there thoughts on TempRoot? Can we get rid of it?
>> 2) Are there any objections to removing the other three attributes? Am I 
>> missing something? Unless there are any objections I think I'll write a 
>> patch to remove them.
>>
>> Thanks,
>> Brian


Re: Deprecate some or all of TestPipelineOptions?

2019-11-06 Thread Kenneth Knowles
Good find. I think TestPipelineOptions is from very early days. It makes
sense to me that these are all obsolete. Some guesses, though I haven't dug
through commit history to confirm:

 - TempRoot: a while ago TempLocation was optional, so I think this would
provide a default for things like gcpTempLocation and stagingLocation
 - OnSuccessMatcher: for runners where pipeline used to not terminate in
streaming mode. Now I think every runner can successfully waitUntilFinish.
Also the current API for waitUntilFinish went through some evolutions
around asynchrony so it wasn't always a good choice.
 - OnCreateMatcher: just for symmetry? I don't know
 - TestTimeoutSeconds: probably also for the asychrony/waitUntilfinish issue

Kenn

On Wed, Nov 6, 2019 at 12:19 PM Brian Hulette  wrote:

> I recently came across TestPipelineOptions, and now I'm wondering if maybe
> it should be deprecated. It only seems to actually be supported for Spark
> and Dataflow (via TestSparkRunner and TestDataflowRunner), and I think it
> may make more sense to move the functionality it provides into the tests
> that need it.
>
> TestPipelineOptions
> 
> currently has four attributes:
>
> # TempRoot
> It's purpose isn't documented, but many tests read TempRoot and use it to
> set a TempLocation (example
> ).
> I think this attribute makes sense (e.g. we can set TempRoot once and each
> test has its own subdirectory), but I'm not sure. Can anyone confirm the
> motivation for it? I'd like to at least add a docstring for it.
>
> # OnCreateMatcher
> A way to register a matcher that will be checked right after a pipeline
> has started. It's never set except for in TestDataflowRunnerTest, so I
> think this is absolutely safe to remove.
>
> # OnSuccessMatcher
> A way to register a matcher that will be checked right after a pipeline
> has successfully completed. This is used in several tests (
> RequiresStableInputIT
> ,
> WordCountIT
> ,
> ... 8 total occurrences), but I don't see why they couldn't all be replaced
> with a `p.run().waitUntilFinish()`, followed by an assert.
>
> I think the current approach is actually dangerous, because running these
> tests with runners other than TestDataflowRunner or TestSparkRunner means
> the matchers are never actually checked. This is actually how I came across
> TestPipelineOptions - I tried running a test with the DirectRunner and
> couldn't make it fail.
>
> # TestTimeoutSeconds
> Seems to just be a wrapper for `waitUntilFinish(duration)`, and only used
> in one place
> .
> I think it would be cleaner for the test to be responsible for calling
> waitUntilFinish (which we do elsewhere), the only drawback is it requires a
> small refactor so the test has access to the PipelineResult object.
>
>
> So I have a couple of questions for the community
> 1) Are there thoughts on TempRoot? Can we get rid of it?
> 2) Are there any objections to removing the other three attributes? Am I
> missing something? Unless there are any objections I think I'll write a
> patch to remove them.
>
> Thanks,
> Brian
>