date:20201006

Re: [VOTE] JupyterLab Sidepanel extension release v1.0.0 for BEAM-10545 RC #1

2020-10-06 Thread Ahmet Altay

+1 - I reviewed the code and document (mostly earlier through the PR
reviews). I also built and installed the extension using Ning's
instructions.

On Tue, Oct 6, 2020 at 2:57 PM Ning Kang  wrote:

> Please review the release of the following jupyter labextension
> (TypeScript node package) for running Beam notebooks in JupyterLab:
> * apache-beam-jupyterlab-sidepanel
>
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 1.0.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1. Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * the assets (only the
> `sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel`
> sub directory) to be published to npmjs.com [1]
> * commit hash "b7ae7bb1dc28a7c8f26e9f48682e781a74e2d3c4" [2]
> * package will be signed by NPM once published; the pgp machinery [3]
>
> Additional details:
> * to install the package before it being published, install it locally by
> cloning the Beam repo or downloading the assets:
>
> git checkout jupyterlab-sidepanel-v1.0.0 -b some-branch # if cloning the
> repo, do this step
>
> pushd sdks/python/apache_beam/runners/interactive/extensions/apache-beam-
> jupyterlab-sidepanel
>
> jlpm
>
> jlpm build
>
> jupyter labextension link .
> * screenshots of the extension [4]
> * a publish dry run:
>
> npm notice === Tarball Details ===
>
> npm notice name:  apache-beam-jupyterlab-sidepanel
>
> npm notice version:   1.0.0
>
> npm notice package size:  19.8 kB
>
> npm notice unpacked size: 101.9 kB
>
> npm notice shasum:7f896de0d6e587aab2bef348a6e94f95f75f280f
>
> npm notice integrity: sha512-hdkn2Ni2S0roY[...]ShMK2/MAbQvyQ==
>
> npm notice total files:   51
>
> npm notice
>
> + apache-beam-jupyterlab-sidepanel@1.0.0
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks!
>
> [1]
> https://github.com/apache/beam/releases/tag/jupyterlab-sidepanel-v1.0.0
> [2]
> https://github.com/apache/beam/commit/b7ae7bb1dc28a7c8f26e9f48682e781a74e2d3c4
> [3] https://blog.npmjs.org/post/172999548390/new-pgp-machinery
> [4]
> https://docs.google.com/document/d/1aKK8TzSrl8WiG0K4v9xZEfLMCinuGqRlMOyb7xOhgy4/edit#heading=h.he7se5yxfo7
>

Re: Self-checkpoint Support on Portable Flink

2020-10-06 Thread Reuven Lax

This is what I was thinking of

"Flink currently only provides processing guarantees for jobs without
iterations. Enabling checkpointing on an iterative job causes an exception.
In order to force checkpointing on an iterative program the user needs to
set a special flag when enabling checkpointing:
env.enableCheckpointing(interval,
CheckpointingMode.EXACTLY_ONCE, force = true).

Please note that records in flight in the loop edges (and the state changes
associated with them) will be lost during failure."






On Tue, Oct 6, 2020 at 5:44 PM Boyuan Zhang  wrote:

> Hi Reuven,
>
> As Luke mentioned, at least there are some limitations around tracking
> watermark with flink cycles. I'm going to use State + Timer without flink
> cycle to support self-checkpoint. For dynamic split, we can either explore
> flink cycle approach or limit depth approach.
>
> On Tue, Oct 6, 2020 at 5:33 PM Reuven Lax  wrote:
>
>> Aren't there some limitations associated with flink cycles? I seem to
>> remember various features that could not be used. I'm assuming that
>> watermarks are not supported across cycles, but is there anything else?
>>
>> On Tue, Oct 6, 2020 at 7:12 AM Maximilian Michels  wrote:
>>
>>> Thanks for starting the conversation. The two approaches both look good
>>> to me. Probably we want to start with approach #1 for all Runners to be
>>> able to support delaying bundles. Flink supports cycles and thus
>>> approach #2 would also be applicable and could be used to implement
>>> dynamic splitting.
>>>
>>> -Max
>>>
>>> On 05.10.20 23:13, Luke Cwik wrote:
>>> > Thanks Boyuan, I left a few comments.
>>> >
>>> > On Mon, Oct 5, 2020 at 11:12 AM Boyuan Zhang >> > > wrote:
>>> >
>>> > Hi team,
>>> >
>>> > I'm looking at adding self-checkpoint support to portable Flink
>>> > runner(BEAM-10940
>>> > ) for both batch
>>> > and streaming. I summarized the problem that we want to solve and
>>> > proposed 2 potential approaches in this doc
>>> > <
>>> https://docs.google.com/document/d/1372B7HYxtcUYjZOnOM7OBTfSJ4CyFg_gaPD_NUxWClo/edit?usp=sharing
>>> >.
>>> >
>>> > I want to collect feedback on which approach is preferred and
>>> > anything that I have not taken into consideration yet but I should.
>>> > Many thanks to all your help!
>>> >
>>> > Boyuan
>>> >
>>>
>>

Re: Self-checkpoint Support on Portable Flink

2020-10-06 Thread Boyuan Zhang

Hi Reuven,

As Luke mentioned, at least there are some limitations around tracking
watermark with flink cycles. I'm going to use State + Timer without flink
cycle to support self-checkpoint. For dynamic split, we can either explore
flink cycle approach or limit depth approach.

On Tue, Oct 6, 2020 at 5:33 PM Reuven Lax  wrote:

> Aren't there some limitations associated with flink cycles? I seem to
> remember various features that could not be used. I'm assuming that
> watermarks are not supported across cycles, but is there anything else?
>
> On Tue, Oct 6, 2020 at 7:12 AM Maximilian Michels  wrote:
>
>> Thanks for starting the conversation. The two approaches both look good
>> to me. Probably we want to start with approach #1 for all Runners to be
>> able to support delaying bundles. Flink supports cycles and thus
>> approach #2 would also be applicable and could be used to implement
>> dynamic splitting.
>>
>> -Max
>>
>> On 05.10.20 23:13, Luke Cwik wrote:
>> > Thanks Boyuan, I left a few comments.
>> >
>> > On Mon, Oct 5, 2020 at 11:12 AM Boyuan Zhang > > > wrote:
>> >
>> > Hi team,
>> >
>> > I'm looking at adding self-checkpoint support to portable Flink
>> > runner(BEAM-10940
>> > ) for both batch
>> > and streaming. I summarized the problem that we want to solve and
>> > proposed 2 potential approaches in this doc
>> > <
>> https://docs.google.com/document/d/1372B7HYxtcUYjZOnOM7OBTfSJ4CyFg_gaPD_NUxWClo/edit?usp=sharing
>> >.
>> >
>> > I want to collect feedback on which approach is preferred and
>> > anything that I have not taken into consideration yet but I should.
>> > Many thanks to all your help!
>> >
>> > Boyuan
>> >
>>
>

Re: Throttling stream outputs per trigger?

2020-10-06 Thread Vincent Marquez

Thanks for the response.  Is my understanding correct that SplittableDoFns
are only applicable to Batch pipelines?  I'm wondering if there's any
proposals to address backpressure needs?
*~Vincent*


On Tue, Oct 6, 2020 at 1:37 PM Luke Cwik  wrote:

> There is no general back pressure mechanism within Apache Beam (runners
> should be intelligent about this but there is currently no way to say I'm
> being throttled so runners don't know that throwing more CPUs at a problem
> won't make it go faster). Y
>
> You can control how quickly you ingest data for runners that support
> splittable DoFns with SDK initiated checkpoints with resume delays. A
> splittable DoFn is able to return resume().withDelay(Duration.seconds(10))
> from the @ProcessElement method. See Watch[1] for an example.
>
> The 2.25.0 release enables more splittable DoFn features on more runners.
> I'm working on a blog (initial draft[2], still mostly empty) to update the
> old blog from 2017.
>
> 1:
> https://github.com/apache/beam/blob/9c239ac93b40e911f03bec5da3c58a07fdceb245/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L908
> 2:
> https://docs.google.com/document/d/1kpn0RxqZaoacUPVSMYhhnfmlo8fGT-p50fEblaFr2HE/edit#
>
>
> On Tue, Oct 6, 2020 at 10:39 AM Vincent Marquez 
> wrote:
>
>> Hmm, I'm not sure how that will help, I understand how to batch up the
>> data, but it is the triggering part that I don't see how to do.  For
>> example, in Spark Structured Streaming, you can set a time trigger which
>> happens at a fixed interval all the way up to the source, so the source can
>> throttle how much data to read even.
>>
>> Here is my use case more thoroughly explained:
>>
>> I have a Kafka topic (with multiple partitions) that I'm reading from,
>> and I need to aggregate batches of up to 500 before sending a single batch
>> off in an RPC call.  However, the vendor specified a rate limit, so if
>> there are more than 500 unread messages in the topic, I must wait 1 second
>> before issuing another RPC call. When searching on Stack Overflow I found
>> this answer: https://stackoverflow.com/a/57275557/25658 that makes it
>> seem challenging, but I wasn't sure if things had changed since then or you
>> had better ideas.
>>
>> *~Vincent*
>>
>>
>> On Thu, Oct 1, 2020 at 2:57 PM Luke Cwik  wrote:
>>
>>> Look at the GroupIntoBatches[1] transform. It will buffer "batches" of
>>> size X for you.
>>>
>>> 1:
>>> https://beam.apache.org/documentation/transforms/java/aggregation/groupintobatches/
>>>
>>> On Thu, Oct 1, 2020 at 2:51 PM Vincent Marquez <
>>> vincent.marq...@gmail.com> wrote:
>>>
 the downstream consumer has these requirements.

 *~Vincent*


 On Thu, Oct 1, 2020 at 2:29 PM Luke Cwik  wrote:

> Why do you want to only emit X? (e.g. running out of memory in the
> runner)
>
> On Thu, Oct 1, 2020 at 2:08 PM Vincent Marquez <
> vincent.marq...@gmail.com> wrote:
>
>> Hello all.  If I want to 'throttle' the number of messages I pull off
>> say, Kafka or some other queue, in order to make sure I only emit X 
>> amount
>> per trigger, is there a way to do that and ensure that I get 'at least
>> once' delivery guarantees?   If this isn't supported, would the better 
>> way
>> be to pull the limited amount opposed to doing it on the output side?
>>
>>
>> *~Vincent*
>>
>

[VOTE] JupyterLab Sidepanel extension release v1.0.0 for BEAM-10545 RC #1

2020-10-06 Thread Ning Kang

Please review the release of the following jupyter labextension (TypeScript
node package) for running Beam notebooks in JupyterLab:
* apache-beam-jupyterlab-sidepanel

Hi everyone,
Please review and vote on the release candidate #1 for the version 1.0.0,
as follows:
[ ] +1, Approve the release
[ ] -1. Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* the assets (only the
`sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel`
sub directory) to be published to npmjs.com [1]
* commit hash "b7ae7bb1dc28a7c8f26e9f48682e781a74e2d3c4" [2]
* package will be signed by NPM once published; the pgp machinery [3]

Additional details:
* to install the package before it being published, install it locally by
cloning the Beam repo or downloading the assets:

git checkout jupyterlab-sidepanel-v1.0.0 -b some-branch # if cloning the
repo, do this step

pushd sdks/python/apache_beam/runners/interactive/extensions/apache-beam-
jupyterlab-sidepanel

jlpm

jlpm build

jupyter labextension link .
* screenshots of the extension [4]
* a publish dry run:

npm notice === Tarball Details ===

npm notice name:  apache-beam-jupyterlab-sidepanel

npm notice version:   1.0.0

npm notice package size:  19.8 kB

npm notice unpacked size: 101.9 kB

npm notice shasum:7f896de0d6e587aab2bef348a6e94f95f75f280f

npm notice integrity: sha512-hdkn2Ni2S0roY[...]ShMK2/MAbQvyQ==

npm notice total files:   51

npm notice

+ apache-beam-jupyterlab-sidepanel@1.0.0

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks!

[1] https://github.com/apache/beam/releases/tag/jupyterlab-sidepanel-v1.0.0
[2]
https://github.com/apache/beam/commit/b7ae7bb1dc28a7c8f26e9f48682e781a74e2d3c4
[3] https://blog.npmjs.org/post/172999548390/new-pgp-machinery
[4]
https://docs.google.com/document/d/1aKK8TzSrl8WiG0K4v9xZEfLMCinuGqRlMOyb7xOhgy4/edit#heading=h.he7se5yxfo7

Re: Throttling stream outputs per trigger?

2020-10-06 Thread Luke Cwik

There is no general back pressure mechanism within Apache Beam (runners
should be intelligent about this but there is currently no way to say I'm
being throttled so runners don't know that throwing more CPUs at a problem
won't make it go faster). Y

You can control how quickly you ingest data for runners that support
splittable DoFns with SDK initiated checkpoints with resume delays. A
splittable DoFn is able to return resume().withDelay(Duration.seconds(10))
from the @ProcessElement method. See Watch[1] for an example.

The 2.25.0 release enables more splittable DoFn features on more runners.
I'm working on a blog (initial draft[2], still mostly empty) to update the
old blog from 2017.

1:
https://github.com/apache/beam/blob/9c239ac93b40e911f03bec5da3c58a07fdceb245/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L908
2:
https://docs.google.com/document/d/1kpn0RxqZaoacUPVSMYhhnfmlo8fGT-p50fEblaFr2HE/edit#


On Tue, Oct 6, 2020 at 10:39 AM Vincent Marquez 
wrote:

> Hmm, I'm not sure how that will help, I understand how to batch up the
> data, but it is the triggering part that I don't see how to do.  For
> example, in Spark Structured Streaming, you can set a time trigger which
> happens at a fixed interval all the way up to the source, so the source can
> throttle how much data to read even.
>
> Here is my use case more thoroughly explained:
>
> I have a Kafka topic (with multiple partitions) that I'm reading from, and
> I need to aggregate batches of up to 500 before sending a single batch off
> in an RPC call.  However, the vendor specified a rate limit, so if there
> are more than 500 unread messages in the topic, I must wait 1 second before
> issuing another RPC call. When searching on Stack Overflow I found this
> answer: https://stackoverflow.com/a/57275557/25658 that makes it seem
> challenging, but I wasn't sure if things had changed since then or you had
> better ideas.
>
> *~Vincent*
>
>
> On Thu, Oct 1, 2020 at 2:57 PM Luke Cwik  wrote:
>
>> Look at the GroupIntoBatches[1] transform. It will buffer "batches" of
>> size X for you.
>>
>> 1:
>> https://beam.apache.org/documentation/transforms/java/aggregation/groupintobatches/
>>
>> On Thu, Oct 1, 2020 at 2:51 PM Vincent Marquez 
>> wrote:
>>
>>> the downstream consumer has these requirements.
>>>
>>> *~Vincent*
>>>
>>>
>>> On Thu, Oct 1, 2020 at 2:29 PM Luke Cwik  wrote:
>>>
 Why do you want to only emit X? (e.g. running out of memory in the
 runner)

 On Thu, Oct 1, 2020 at 2:08 PM Vincent Marquez <
 vincent.marq...@gmail.com> wrote:

> Hello all.  If I want to 'throttle' the number of messages I pull off
> say, Kafka or some other queue, in order to make sure I only emit X amount
> per trigger, is there a way to do that and ensure that I get 'at least
> once' delivery guarantees?   If this isn't supported, would the better way
> be to pull the limited amount opposed to doing it on the output side?
>
>
> *~Vincent*
>

Re: [UPDATE] Beam 2.25.0 release progress update

2020-10-06 Thread Kyle Weaver

> +1 to the idea. We discussed it in the dev list [1]. I do not believe we
discussed it with INFRA.

We would have to collect everyone's Docker hub usernames first.

> Could someone help Robin on this ticket? Would reaching out to infra on
slack help?

I messaged #asfinfra on Slack.

Re: PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit

2020-10-06 Thread Ning Kang

Thanks Alex!

Based on the standard output, it looks like the None is returned by one of
the failure paths when the environment is considered not in a notebook.

There could be a race condition when tests are running in parallel and
modifying the global instance ib.current_env().
In that case, a patch in the test to use a mocked `is_in_notebook` check
should be added.

Filed https://issues.apache.org/jira/browse/BEAM-11025 and sent out
https://github.com/apache/beam/pull/13020.

On Tue, Oct 6, 2020 at 9:40 AM Alex Amato  wrote:

> I am seeing this failure in the precommit of a PR
>  where I am trying to update
> the Dataflow container reference.
>
> I would have filed a JIRA issue as well, but I can't seem to load the
> website right now. Is this test known to be flakey or something? Has it
> regressed? I don't suspect this interactive runner test is using the new
> container referenced in the PR, sio I didn't think this PR would affect
> this test. Though I could be wrong.
>
> I will rerun for now.
> Please let me know if you have any suggestions
>
> Details
> ---
>
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2241/testReport/junit/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_return_handle/
>
> Error Message
>
> AssertionError: None is not an instance of 
>
> Stacktrace
>
> self = 
>   testMethod=test_dynamic_plotting_return_handle>
>
> def test_dynamic_plotting_return_handle(self):
>   h = pv.visualize(
>   self._stream, dynamic_plotting_interval=1, display_facets=True)
> > self.assertIsInstance(h, timeloop.Timeloop)
> E AssertionError: None is not an instance of  'timeloop.app.Timeloop'>
>
> apache_beam/runners/interactive/display/pcoll_visualization_test.py:93: 
> AssertionError
>
> Standard Output
>
> 
> 
>0
> 0  0
> 1  1
> 2  2
> 3  3
> 4  4
>
>

Re: Throttling stream outputs per trigger?

2020-10-06 Thread Vincent Marquez

Hmm, I'm not sure how that will help, I understand how to batch up the
data, but it is the triggering part that I don't see how to do.  For
example, in Spark Structured Streaming, you can set a time trigger which
happens at a fixed interval all the way up to the source, so the source can
throttle how much data to read even.

Here is my use case more thoroughly explained:

I have a Kafka topic (with multiple partitions) that I'm reading from, and
I need to aggregate batches of up to 500 before sending a single batch off
in an RPC call.  However, the vendor specified a rate limit, so if there
are more than 500 unread messages in the topic, I must wait 1 second before
issuing another RPC call. When searching on Stack Overflow I found this
answer: https://stackoverflow.com/a/57275557/25658 that makes it seem
challenging, but I wasn't sure if things had changed since then or you had
better ideas.

*~Vincent*

On Thu, Oct 1, 2020 at 2:57 PM Luke Cwik  wrote:

> Look at the GroupIntoBatches[1] transform. It will buffer "batches" of
> size X for you.
>
> 1:
> https://beam.apache.org/documentation/transforms/java/aggregation/groupintobatches/
>
> On Thu, Oct 1, 2020 at 2:51 PM Vincent Marquez 
> wrote:
>
>> the downstream consumer has these requirements.
>>
>> *~Vincent*
>>
>>
>> On Thu, Oct 1, 2020 at 2:29 PM Luke Cwik  wrote:
>>
>>> Why do you want to only emit X? (e.g. running out of memory in the
>>> runner)
>>>
>>> On Thu, Oct 1, 2020 at 2:08 PM Vincent Marquez <
>>> vincent.marq...@gmail.com> wrote:
>>>
 Hello all.  If I want to 'throttle' the number of messages I pull off
 say, Kafka or some other queue, in order to make sure I only emit X amount
 per trigger, is there a way to do that and ensure that I get 'at least
 once' delivery guarantees?   If this isn't supported, would the better way
 be to pull the limited amount opposed to doing it on the output side?

 *~Vincent*

>>>

PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit

2020-10-06 Thread Alex Amato

I am seeing this failure in the precommit of a PR
 where I am trying to update the
Dataflow container reference.

I would have filed a JIRA issue as well, but I can't seem to load the
website right now. Is this test known to be flakey or something? Has it
regressed? I don't suspect this interactive runner test is using the new
container referenced in the PR, sio I didn't think this PR would affect
this test. Though I could be wrong.

I will rerun for now.
Please let me know if you have any suggestions

Details
---
https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2241/testReport/junit/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_return_handle/

Error Message

AssertionError: None is not an instance of 

Stacktrace

self = 


def test_dynamic_plotting_return_handle(self):
  h = pv.visualize(
  self._stream, dynamic_plotting_interval=1, display_facets=True)
> self.assertIsInstance(h, timeloop.Timeloop)
E AssertionError: None is not an instance of 

apache_beam/runners/interactive/display/pcoll_visualization_test.py:93:
AssertionError

Standard Output



   0
0  0
1  1
2  2
3  3
4  4

Re: Self-checkpoint Support on Portable Flink

2020-10-06 Thread Maximilian Michels

Thanks for starting the conversation. The two approaches both look good 
to me. Probably we want to start with approach #1 for all Runners to be 
able to support delaying bundles. Flink supports cycles and thus 
approach #2 would also be applicable and could be used to implement 
dynamic splitting.


-Max

On 05.10.20 23:13, Luke Cwik wrote:

Thanks Boyuan, I left a few comments.

On Mon, Oct 5, 2020 at 11:12 AM Boyuan Zhang > wrote:


Hi team,

I'm looking at adding self-checkpoint support to portable Flink
runner(BEAM-10940
) for both batch
and streaming. I summarized the problem that we want to solve and
proposed 2 potential approaches in this doc

.

I want to collect feedback on which approach is preferred and
anything that I have not taken into consideration yet but I should.
Many thanks to all your help!

Boyuan

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-10-06 Thread Yoshiki Obata

I've written a mini doc[1] about how to update python tests to reduce
consumption test resources.
It would be helpful to check this and comment if there are better solutions.

[1] 
https://docs.google.com/document/d/1tfCWtMxfqjgsokjRkOGh2I4UAvX8B98ZOys0crzCMiw/edit?usp=sharing

2020年7月31日(金) 9:44 Valentyn Tymofieiev :
>
> We have added Python 3.8 support in Apache Beam 2.23.0 release[1] and 
> established the plan to remove Python 2.7 support in 2.25.0 release[2].
>
> I think it is in the interest of the community to reduce the overhead 
> associated with adding and removing support of Python minor versions in Beam 
> in the future. To do so, I opened a ticket [3] to document the process of 
> adding/removing a Python version on the Beam website, and would like to recap 
> the discussion on this thread.
>
> It seems that the consensus is to align support of Python versions in Beam 
> with Python annual release cycle[4]. This means:
>
> 1. We will aim to add support for a new Python 3.x version in Beam as soon as 
> it is released.
> 2. After a Python 3.x version reaches the end of support[5], we will remove 
> support for this version in Beam, starting from the first Beam release that 
> is cut after the end-of-support date.
> 3. The rules above are our default course of action, but can be adjusted on a 
> case-by-case basis via a discussion on dev@.
>
> Please let me know if you think this needs further discussion.
>
> A corollary of 1-3 is that:
> - we should plan to remove support for Python 3.5 starting from 2.25.0 
> release, since Python 3.5 reaches[5] end-of-support on 2020-09-13, and we 
> plan to cut 2.25.0 on 2020-09-23 according to our release calendar [6],
> - we can start working on adding Python 3.9 support shortly after.
>
> Thanks,
> Valentyn
>
> [1] https://beam.apache.org/blog/beam-2.23.0/
> [2] 
> https://lists.apache.org/thread.html/r4be18d50ccfc5543a34e083f3e6711f9f370896f109f21f4677c%40%3Cdev.beam.apache.org%3E
> [3] https://issues.apache.org/jira/browse/BEAM-10605
> [4] https://www.python.org/dev/peps/pep-0602/
> [5] https://www.python.org/downloads/
> [6] 
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>
> On Thu, May 14, 2020 at 9:56 AM Yoshiki Obata  wrote:
>>
>> Thank you, Kyle and Valentyn.
>>
>> I'll update test codes to treat Python 3.5 and 3.7 as high-priority
>> versions at this point.
>>
>> 2020年5月12日(火) 2:10 Valentyn Tymofieiev :
>> >
>> > I agree with the point echoed earlier that the lowest and the highest of 
>> > supported versions will probably give the most useful test signal for 
>> > possible breakages. So 3.5. and 3.7 as high-priority versions SGTM.
>> >
>> > This can change later once Beam drops 3.5 support.
>> >
>> > On Mon, May 11, 2020 at 10:05 AM Yoshiki Obata  
>> > wrote:
>> >>
>> >> Hello again,
>> >>
>> >> Test infrastructure update is ongoing and then we should determine
>> >> which Python versions are high-priority.
>> >>
>> >> According to Pypi downloads stats[1], download proportion of Python
>> >> 3.5 is almost always greater than one of 3.6 and 3.7.
>> >> This situation has not changed since Robert told us Python 3.x
>> >> occupies nearly 40% of downloads[2]
>> >>
>> >> On the other hand, according to docker hub[3],
>> >> apachebeam/python3.x_sdk image downloaded the most is one of Python
>> >> 3.7 which was pointed by Kyle[4].
>> >>
>> >> Considering these stats, I think high-priority versions are 3.5 and 3.7.
>> >>
>> >> Is this assumption appropriate?
>> >> I would like to hear your thoughts about this.
>> >>
>> >> [1] https://pypistats.org/packages/apache-beam
>> >> [2] 
>> >> https://lists.apache.org/thread.html/r208c0d11639e790453a17249e511dbfe00a09f91bef8fcd361b4b74a%40%3Cdev.beam.apache.org%3E
>> >> [3] https://hub.docker.com/search?q=apachebeam%2Fpython=image
>> >> [4] 
>> >> https://lists.apache.org/thread.html/r9ca9ad316dae3d60a3bf298eedbe4aeecab2b2664454cc352648abc9%40%3Cdev.beam.apache.org%3E
>> >>
>> >> 2020年5月6日(水) 12:48 Yoshiki Obata :
>> >> >
>> >> > > Not sure how run_pylint.sh is related here - we should run linter on 
>> >> > > the entire codebase.
>> >> > ah, I mistyped... I meant run_pytest.sh
>> >> >
>> >> > > I am familiar with beam_PostCommit_PythonXX suites. Is there 
>> >> > > something specific about these suites that you wanted to know?
>> >> > Test suite runtime will depend on the number of  tests in the suite,
>> >> > how many tests we run in parallel, how long they take to run. To
>> >> > understand the load on test infrastructure we can monitor Beam test
>> >> > health metrics [1]. In particular, if time in queue[2] is high, it is
>> >> > a sign that there are not enough Jenkins slots available to start the
>> >> > test suite earlier.
>> >> > Sorry for ambiguous question. I wanted to know how to see the load on
>> >> > test infrastructure.
>> >> > The Grafana links you showed serves my purpose. Thank you.
>> >> >
>> >> > 2020年5月6日(水) 2:35 Valentyn

Re: [VOTE] JupyterLab Sidepanel extension release v1.0.0 for BEAM-10545 RC #1

Re: Self-checkpoint Support on Portable Flink

Re: Self-checkpoint Support on Portable Flink

Re: Throttling stream outputs per trigger?

[VOTE] JupyterLab Sidepanel extension release v1.0.0 for BEAM-10545 RC #1

Re: Throttling stream outputs per trigger?

Re: [UPDATE] Beam 2.25.0 release progress update

Re: PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit

Re: Throttling stream outputs per trigger?

PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit

Re: Self-checkpoint Support on Portable Flink

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

12 matches

Site Navigation

Mail list logo

Footer information