Re: Proposal: Generalize S3FileSystem

2021-05-20 Thread Charles Chen
Is it feasible to keep the endpoint information in the path? It seems pretty desirable to keep URIs "universal" so that it's possible to understand what is being pointed to without explicit service configuration, so maybe you can have a scheme like "s3+endpoint=api.example.com ://my/bucket/path"?

Re: [ANNOUNCE] New PMC Member: Chamikara Jayalath

2021-01-21 Thread Charles Chen
Congrats Cham! On Thu, Jan 21, 2021, 5:39 PM Chamikara Jayalath wrote: > Thanks everybody :) > > - Cham > > On Thu, Jan 21, 2021 at 5:22 PM Pablo Estrada wrote: > >> Yoohoo Cham : ) >> >> On Thu, Jan 21, 2021 at 5:20 PM Udi Meiri wrote: >> >>> Congrats Cham! >>> >>> On Thu, Jan 21, 2021 at 4:2

Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-26 Thread Charles Chen
Thank you and congratulations Valentyn! Much appreciated and deserved! On Mon, Aug 26, 2019 at 2:33 PM Reza Rokni wrote: > Thanks Valentin! > > On Tue, 27 Aug 2019, 05:32 Pablo Estrada, wrote: > >> Thanks Valentyn! >> >> On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu wrote: >> >>> Thank you Valent

Re: [ANNOUNCE] Beam 2.15.0 Released!

2019-08-23 Thread Charles Chen
Thank you Yifan! On Fri, Aug 23, 2019 at 11:12 AM Hannah Jiang wrote: > Thank you Yifan! > > On Fri, Aug 23, 2019 at 11:09 AM Yichi Zhang wrote: > >> Thank you Yifan! >> >> On Fri, Aug 23, 2019 at 11:06 AM Robin Qiu wrote: >> >>> Thank you Yifan! >>> >>> On Fri, Aug 23, 2019 at 11:05 AM Rui Wa

Re: [ANNOUNCE] New PMC Member: Pablo Estrada

2019-05-15 Thread Charles Chen
Congrats Pablo and thank you for your contributions! On Wed, May 15, 2019, 10:53 AM Valentyn Tymofieiev wrote: > Congrats, Pablo! > > On Wed, May 15, 2019 at 10:41 AM Yifan Zou wrote: > >> Congratulations, Pablo! >> >> *From: *Maximilian Michels >> *Date: *Wed, May 15, 2019 at 2:06 AM >> *To:

Re: Beam's Conda package

2019-05-10 Thread Charles Chen
Looks like this is where it's living: https://github.com/conda-forge/apache-beam-feedstock/tree/c96274713fcc5970c967c20e84859e73d0efa0d0 *From: *Lukasz Cwik *Date: *Fri, May 10, 2019 at 1:02 PM *To: *dev I'm not aware of who set up conda as well. There seem to have been ~4500 > downloads of the

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Charles Chen
Thank you Udi! On Fri, May 3, 2019, 1:51 PM Aizhamal Nurmamat kyzy wrote: > Congratulations, Udi! Thank you for all your contributions!!! > > *From: *Pablo Estrada > *Date: *Fri, May 3, 2019 at 1:45 PM > *To: *dev > > Thanks Udi and congrats! >> >> On Fri, May 3, 2019 at 1:44 PM Kenneth Knowles

Re: [VOTE] Release 2.11.0, release candidate #2

2019-02-26 Thread Charles Chen
Thank you, +1. I tested Python 3 support in batch and streaming mode (using wordcount and streaming wordcount) on both DirectRunner and DataflowRunner. On Tue, Feb 26, 2019 at 7:54 AM Konstantinos Katsiapis wrote: > +1. > (Same rational as my earlier post for RC1). > > On Tue, Feb 26, 2019 at 2

Re: [VOTE] Release 2.11.0, release candidate #1

2019-02-25 Thread Charles Chen
+1. I tested Python 3 support in batch and streaming mode (using wordcount and streaming wordcount) on both DirectRunner and DataflowRunner. On Mon, Feb 25, 2019 at 7:54 AM Łukasz Gajowy wrote: > Hi, > > https://issues.apache.org/jira/browse/BEAM-6697 Is this issue a release > blocker? I'm aski

Re: 2.7.1 (LTS) release?

2019-01-31 Thread Charles Chen
I would be in favor of keeping the old 2.7.0 release branch / tag static so that referring to it will always get the right 2.7.0 code. On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles wrote: > I have waffled on whether to have release-2.7 and only branch > release-2.7.1 when starting that releas

Re: [PROPOSAL] Prepare Beam 2.9.0 release

2018-11-15 Thread Charles Chen
+1 Note that we need to temporarily revert https://github.com/apache/beam/pull/6683 before the release branch cut per the discussion at https://lists.apache.org/thread.html/78fe33dc41b04886f5355d66d50359265bfa2985580bb70f79c53545@%3Cdev.beam.apache.org%3E On Thu, Nov 15, 2018 at 9:18 PM Tim wrot

Re: [DISCUSS] More precision supported by DATETIME field in Schema

2018-11-06 Thread Charles Chen
s > > lossy, so we should seriously consider upgrading that as well. > > On Tue, Nov 6, 2018 at 6:42 AM Charles Chen wrote: > > > > > > One related issue that came up before is that we (perhaps > unnecessarily) restrict the precision of timestamps in the Python

Re: [DISCUSS] More precision supported by DATETIME field in Schema

2018-11-05 Thread Charles Chen
One related issue that came up before is that we (perhaps unnecessarily) restrict the precision of timestamps in the Python SDK to milliseconds because of legacy reasons related to the Java runner's use of Joda time. Perhaps Beam portability should natively use a more granular timestamp unit. On M

Re: New Edit button on beam.apache.org pages

2018-10-24 Thread Charles Chen
This is great! Thanks! On Wed, Oct 24, 2018 at 2:26 PM Ahmet Altay wrote: > Really cool! Thank you! > > On Wed, Oct 24, 2018 at 2:24 PM, Alan Myrvold wrote: > >> To make small documentation changes easier, there is now an Edit button >> at the top right of the pages on https://beam.apache.org.

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Charles Chen
Would you mind to restore it in master? > > Thanks > > On Fri, Oct 12, 2018 at 4:40 PM Ahmet Altay wrote: > >> >> >> On Fri, Oct 12, 2018 at 11:31 AM, Charles Chen wrote: >> >>> What I mean is that a user may find that it works for them to p

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Charles Chen
with the design aspects and PR reviews of changes that > affect common Python code. Anyone who specifically wants to be tagged on > relevant JIRAs and PRs? > > Thanks > > > On Fri, Oct 12, 2018 at 10:20 AM Ahmet Altay wrote: > >> >> >> On Fri, Oct 12, 2018 at

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Charles Chen
me collisions. So I think this would benefit from broader >> feedback. >> >> Thanks, >> Thomas >> >> >> -- Forwarded message - >> From: Charles Chen >> Date: Fri, Oct 12, 2018 at 8:36 AM >> Subject: Re: [apache/beam] [BEAM-5442]

Re: Finalizing the 2.7.0 release

2018-10-09 Thread Charles Chen
On it. On Tue, Oct 9, 2018 at 9:30 AM Jean-Baptiste Onofré wrote: > Sorry, by announcement, it was not the mailing list, I meant the tag > alias, the Jira version, the artifacts update, the dist cleanup, etc. > > Regards > JB > > On 09/10/2018 18:18, Thomas Weise wrote: > > BTW on our github rea

[ANNOUNCE] Apache Beam 2.7.0 released!

2018-10-03 Thread Charles Chen
am 2.7.0. -- Charles Chen, on behalf of The Apache Beam team

[HELP] Blog post for upcoming 2.7.0 release

2018-09-30 Thread Charles Chen
Hi all, We will be announcing the Apache Beam 2.7.0 release shortly. As part of this, we will be doing a blog post with improvement and feature highlights. Please add your release notes and comments to this doc: https://docs.google.com/document/d/1jIk0pc8CxTMmtz5b7UL0gSPxmjKnyerVFS6FcpP2Ym8/edit

Re: [VOTE] Release 2.7.0, release candidate #3

2018-09-28 Thread Charles Chen
t there are no > performance regressions. > > czw., 27 wrz 2018, 00:02 użytkownik Jean-Baptiste Onofré > napisał: > >> +1 (binding) >> >> Regards >> JB >> Le 26 sept. 2018, à 18:00, Ahmet Altay a écrit: >>> >>> +1. Thank you all! >>

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-28 Thread Charles Chen
hink Thomas was mainly concerned about "fixup" commits to >>>>>> > land in >>>>>> > master (as part of a merge). These indeed make reverting >>>>>> commits >>>>>> > more >>>>>>

Re: [VOTE] Release 2.7.0, release candidate #3

2018-09-26 Thread Charles Chen
+1. Performed additional validations as listed in the spreadsheet. On Wed, Sep 26, 2018, 3:24 AM Robert Bradshaw wrote: > +1 (binding), same verification as before. > > On Wed, Sep 26, 2018 at 7:36 AM Charles Chen wrote: > >> To clarify, the only difference between RC2 and

Re: [VOTE] Release 2.7.0, release candidate #3

2018-09-25 Thread Charles Chen
Chen wrote: > As with before, please add any validation performed to the spreadsheet > here: > https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1675964688 > > On Wed, Sep 26, 2018 at 12:30 AM Charles Chen wrote: > >> Hi everyone,

Re: [VOTE] Release 2.7.0, release candidate #3

2018-09-25 Thread Charles Chen
As with before, please add any validation performed to the spreadsheet here: https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1675964688 On Wed, Sep 26, 2018 at 12:30 AM Charles Chen wrote: > Hi everyone, > > Please review and vote on th

[VOTE] Release 2.7.0, release candidate #3

2018-09-25 Thread Charles Chen
Hi everyone, Please review and vote on the release candidate #3 for the version 2.7.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], *

Re: [VOTE] Release 2.7.0, release candidate #2

2018-09-25 Thread Charles Chen
I have merged https://github.com/apache/beam/pull/6494 and https://github.com/apache/beam/pull/6495 which revert the offending commit in master and release-2.7.0, respectively. I am building 2.7.0 RC3 which will be out shortly. On Tue, Sep 25, 2018 at 9:52 PM Charles Chen wrote

Re: [VOTE] Release 2.7.0, release candidate #2

2018-09-25 Thread Charles Chen
success On Tue, Sep 25, 2018 at 4:53 PM Charles Chen wrote: > Hi all, please add any validation performed to the spreadsheet here: > https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1675964688 > > On Tue, Sep 25, 2018 at 12:11 PM Romain Manni

Re: [VOTE] Release 2.7.0, release candidate #2

2018-09-25 Thread Charles Chen
not shipping gradle[w] but otherwise the content >>> matches the git repo (except a SNAPSHOT vs version change to the source). >>> >>> The changes [1] look minimal compared to RC1, so most of the >>> verification there should apply as well. >>> >>>

[VOTE] Release 2.7.0, release candidate #2

2018-09-24 Thread Charles Chen
Hi everyone, Please review and vote on the release candidate #2 for the version 2.7.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], *

Re: Python PreCommit broken

2018-09-21 Thread Charles Chen
> branch] and merge, right? > > >> >> On Fri, Sep 21, 2018, 1:52 PM Ahmet Altay wrote: >> >>> I will suggest a rollback in this case, and in general as a good >>> practice to unblock people. >>> >>> On Fri, Sep 21, 2018 at 1:02 PM, Ch

Re: Python PreCommit broken

2018-09-21 Thread Charles Chen
ub.com/apache/beam/pull/6424 in the mean time, while > we are iterating on the fix. > > On Fri, Sep 21, 2018 at 11:41 AM Charles Chen wrote: > >> Do we happen to know the root cause for why this wasn't caught during >> review / precommit? >> >> In the future, can we

Re: Python PreCommit broken

2018-09-21 Thread Charles Chen
Do we happen to know the root cause for why this wasn't caught during review / precommit? In the future, can we run manually run postcommits for risky changes like these? That is, trigger it by commenting "Run Python PostCommit"? On Fri, Sep 21, 2018 at 10:10 AM Pablo Estrada wrote: > Robbe ha

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-20 Thread Charles Chen
My mistake, it looks like the correct beam staging repository ( https://repository.apache.org/content/repositories/orgapachebeam-1046/) is specified in your pom file. On Thu, Sep 20, 2018 at 2:10 PM Charles Chen wrote: > Hey Romain and JB, do you have any progress on this? One thing I wo

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-20 Thread Charles Chen
> Hi, >>> > >>> > I don't have the issue ;) >>> > >>> > As said in my vote, I tested 2.7.0 RC1 on beam-samples with Spark >>> > without problem. >>> > >>> > I don't reproduce Romain

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-19 Thread Charles Chen
easier. > > Example PR: Support X in the pipeline > > Commit 1: Restructuring a bunch of code without any logical change. > > Commit 2: Changing validation logic for pipeline. > > Commit 3: Supporting new field "X" for pipeline. > > > >

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-19 Thread Charles Chen
PRs instead of large jumbo PRs whenever possible. On Wed, Sep 19, 2018 at 11:20 AM Charles Chen wrote: > I don't think it's actually harder to roll back a set of commits that are > merged together. Git has the notion of first-parent commits (you can see, > for example, &

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-19 Thread Charles Chen
I don't think it's actually harder to roll back a set of commits that are merged together. Git has the notion of first-parent commits (you can see, for example, "git log --first-parent", which filters out the intermediate commits). In this sense, PRs still get merged as one unit and this is prese

Re: Proposal for Beam Python User State and Timer APIs

2018-09-18 Thread Charles Chen
An update: the reference DirectRunner implementation of (and common execution code for) the Python user state and timers API has been merged: https://github.com/apache/beam/pull/6304 On Thu, Aug 30, 2018 at 1:48 AM Charles Chen wrote: > Another update: the reference DirectRunner implementat

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-17 Thread Charles Chen
ww.linkedin.com/in/rmannibucau> | Book > <https://www.packtpub.com/application-development/java-ee-8-high-performance> > > > Le lun. 17 sept. 2018 à 19:18, Charles Chen a écrit : > >> Luke, Maximillian, Raghu, can you please propose cherry-pick PRs to the >> release-

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-17 Thread Charles Chen
lt;https://twitter.com/rmannibucau> | Blog >> <https://rmannibucau.metawerx.net/> | Old Blog >> <http://rmannibucau.wordpress.com/> | Github >> <https://github.com/rmannibucau> | LinkedIn >> <https://www.linkedin.com/in/rmannibucau> | Book >

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-13 Thread Charles Chen
put pcollection size" and "execution time" > around release cut date on dataflow, spark, flink and direct runner in > batch and streaming modes. There seems to be no regression. > > Etienne > > Le mardi 11 septembre 2018 à 12:25 -0700, Charles Chen a écrit : > >

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Charles Chen
ithub.com/Talend/component-runtime/blob/master/component-runtime-beam/src/it/serialization-over-cluster/src/test/java/org/talend/sdk/component/beam/it/SerializationOverClusterIT.java >> >> Le mar. 11 sept. 2018 20:54, Charles Chen a écrit : >> >>> Romain: can you give

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Charles Chen
cker but it was an easy fix > > (https://github.com/apache/beam/pull/6358) and users may rely on the > > pom.xml. > > > > Should we recut the release candidate to include this? > > > > On Mon, Sep 10, 2018 at 4:58 AM Jean-Baptiste Onofré > > mailt

[VOTE] Release 2.7.0, release candidate #1

2018-09-07 Thread Charles Chen
Hi everyone, Please review and vote on the release candidate #1 for the version 2.7.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], *

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-09-06 Thread Charles Chen
Making steady progress, but there is an issue compiling with Protobuf for Python which I am investigating right now. On Wed, Sep 5, 2018 at 11:13 PM Charles Chen wrote: > Thanks! I talked to Boyuan who indicated that this may be an issue > specific to the JVM version I am running, sin

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-09-05 Thread Charles Chen
segfault is not related to exactly this task. > > On 5 Sep 2018, at 06:46, Charles Chen wrote: > > I attempted to cut the 2.7.0 RC1 today, but encountered an issue where the > JVM consistently segfaulted at > the :beam-sdks-java-io-hadoop-input-format:test task. I will investigate &g

Re: Python 3: final step

2018-09-05 Thread Charles Chen
This is great! Feel free to add me as a reviewer. On Wed, Sep 5, 2018 at 9:38 AM Andrew Pilloud wrote: > Cool! I know very little about Python 3, but happy to help review. > > Andrew > > On Wed, Sep 5, 2018 at 9:21 AM Ahmet Altay wrote: > >> Thank you Robbe, this is great news! >> >> On Wed, S

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-09-04 Thread Charles Chen
PM Jean-Baptiste Onofré wrote: > It sounds good to me. I'm late on my PRs, but not release blocker > anyway. I will wait the next release cycle ;) > > Regards > JB > > On 29/08/2018 21:18, Charles Chen wrote: > > I have cut the release branch for 2.7.0, since ther

Re: delayed emit (timer) in py-beam?

2018-08-30 Thread Charles Chen
FYI: the reference DirectRunner implementation of the Python user state and timers API is out for review: https://github.com/apache/beam/pull/6304 On Mon, Jul 30, 2018 at 3:57 PM Austin Bennett wrote: > Fantastic; thanks, Charles! > > > > On Mon, Jul 30, 2018 at 3:49 PM, Char

Re: Proposal for Beam Python User State and Timer APIs

2018-08-30 Thread Charles Chen
Another update: the reference DirectRunner implementation of the Python user state and timers API is out for review: https://github.com/apache/beam/pull/6304 On Mon, Jul 9, 2018 at 2:18 PM Charles Chen wrote: > An update: https://github.com/apache/beam/pull/5691 has been merged. I > h

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-08-29 Thread Charles Chen
I have cut the release branch for 2.7.0, since there continue to be no blockers listed in JIRA. I will build the first release candidate soon. On Mon, Aug 27, 2018 at 10:38 PM Charles Chen wrote: > Hey everyone, I want to highlight again to those who missed it that if you > are aware

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-08-27 Thread Charles Chen
JIRA, and I will be cutting the release branch on Wednesday 8/29. Best, Charles On Fri, Aug 24, 2018 at 2:38 PM Charles Chen wrote: > Thanks everyone. Again, we will proceed with the initial release cut on > August 29. > > A reminder to please tag any blocking issues as "Prior

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-08-24 Thread Charles Chen
hang wrote: >> >>> +1 >>> Thanks for volunteering, Charles! >>> >>> On Mon, Aug 20, 2018 at 3:22 PM Rafael Fernandez >>> wrote: >>> >>>> +1, thanks for volunteering, Charles! >>>> >>>> On Mon, Aug 20,

Re: BEAM-5180 for 2.7.0 ?

2018-08-24 Thread Charles Chen
Thank you for getting the partial rollback in. I will close https://issues.apache.org/jira/browse/BEAM-5180 as fixed. Ankur: if you have a more nuanced fix in mind, please open a new JIRA ticket to track and update us on this thread. On Fri, Aug 24, 2018 at 10:42 AM Ankur Goenka wrote: > Repli

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-08-20 Thread Charles Chen
nell O'Callaghan > wrote: > >> +1 Charles thank you for taking this up and helping us maintain this >> schedule. >> >> On Mon, Aug 20, 2018 at 11:29 AM Charles Chen wrote: >> >>> Hey everyone, >>> >>> Our release calendar indicat

[PROPOSAL] Prepare Beam 2.7.0 release

2018-08-20 Thread Charles Chen
Hey everyone, Our release calendar indicates that the process for the 2.7.0 Beam release should start on September 7. I volunteer to perform this release and propose the following schedule: - We start triaging issues in JIRA this week. - I will cut the initial 2.7.0 release branch on Septe

Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

2018-08-15 Thread Charles Chen
+1 for CREATE EXTERNAL TABLE. It is a good balance between the general SQL expectation of having tables as an abstraction and reinforcing that Beam does not store your data. On Wed, Aug 15, 2018 at 1:58 PM Rui Wang wrote: > > I think users will be more confused to find that 'CREATE TABLE' does

Re: [Discussion] Clarify the support story for released Beam versions

2018-08-13 Thread Charles Chen
(sending to the dev@ list thread as this is more relevant here than users@) Will we be using a different / potentially more rigorous process for releasing LTS releases? Or do we feel that any validations that could possibly be done should already be incorporated into each release? On Mon, Aug 13

Re: [VOTE] Community Examples Repository

2018-08-08 Thread Charles Chen
;> >> On Wed, Aug 8, 2018 at 11:01 AM Rui Wang wrote: >> >>> 2 - examples that rely on experimental API can still stay in where they >>> are because such examples could be changed. >>> >>> -Rui >>> >>> On Wed, Aug 8, 2018 at 10:52 A

Re: [VOTE] Community Examples Repository

2018-08-08 Thread Charles Chen
3 - We benefit from increased test coverage by having examples together with the rest of the code. As Robert mentions in the doc, hosting the Beam examples in the main repository is the best way to keep the examples visible, tested and maintained. Given that we recently moved to a single reposito

Re: Community Examples Repository

2018-08-03 Thread Charles Chen
Robert Bradshaw < >>> rober...@google.com> >>> > > wrote: >>> > > >>> > >> I have to admit I'm generally -1 on moving examples to a separate >>> > >> repository. In particular, I think it would actually inhibit the

Re: Community Examples Repository

2018-08-01 Thread Charles Chen
I would also prefer that examples be linked to releases so that we can build and test them during development; i.e. if your commit breaks wordcount, we want to know right away so we can revert. Perhaps we can keep these in the repo but more clearly modularize the artifacts we release? For the Pyt

Re: Community Examples Repository

2018-08-01 Thread Charles Chen
The examples we have right now serve both as examples to users and along with their unit tests, as tests of functionality. If we move the examples out, what is a good way to make sure that we continue to have visibility and test coverage? Can we address this in a section of the doc? On Wed, Aug

Re: delayed emit (timer) in py-beam?

2018-07-30 Thread Charles Chen
Hey Austin, This API is not yet implemented in the Python SDK. I am working on this feature: the next step from my end is to finish a reference implementation in the local DirectRunner. As you note, the doc at https://s.apache.org/beam-python-user-state-and-timers describes the design. You can

Re: Proposal for Beam Python User State and Timer APIs

2018-07-09 Thread Charles Chen
20, 2018 at 10:00 AM Charles Chen wrote: > An update on the implementation: I recently sent out the user-facing > pipeline construction part of the API implementation out for review: > https://github.com/apache/beam/pull/5691. > > On Tue, Jun 5, 2018 at 5:26 PM Charles Chen wro

Re: Python 3 support in the Python SDK

2018-07-02 Thread Charles Chen
Hi Sergei, Matthias and Robbe are actively working on this support. Their plan is to futurize all relevant modules and then work on Beam Python 3 tests; this is being tracked in https://issues.apache.org/jira/browse/BEAM-2784 and I added https://issues.apache.org/jira/browse/BEAM-4715 as well. W

Re: Proposal for Beam Python User State and Timer APIs

2018-06-20 Thread Charles Chen
An update on the implementation: I recently sent out the user-facing pipeline construction part of the API implementation out for review: https://github.com/apache/beam/pull/5691. On Tue, Jun 5, 2018 at 5:26 PM Charles Chen wrote: > Thanks everyone for contributing here. We've reach

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-15 Thread Charles Chen
> Is it ok from your side ? > > Regards > JB > > On 15/06/2018 01:54, Charles Chen wrote: > > Looks like there is something wrong with PR 5636 > > <https://github.com/apache/beam/pull/5636> which we cherry-picked > > above. It breaks leaderboard examples

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-14 Thread Charles Chen
iste Onofré wrote: > Sure, just in time ;) > > Regards > JB > > On 14/06/2018 20:58, Charles Chen wrote: > > Can you also merge the CP https://github.com/apache/beam/pull/5636 for > > https://issues.apache.org/jira/browse/BEAM-4549? > > > > On Thu, Jun

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-14 Thread Charles Chen
Can you also merge the CP https://github.com/apache/beam/pull/5636 for https://issues.apache.org/jira/browse/BEAM-4549? On Thu, Jun 14, 2018 at 6:52 AM Jean-Baptiste Onofré wrote: > FYI, I'm starting RC2 right now. > > Stay tuned ! > > Regards > JB > > On 06/06/2018 10:44, Jean-Baptiste Onofré w

Re: [DISCUSS] Use Confluence wiki for non-user-facing stuff

2018-06-07 Thread Charles Chen
+1. It would be very helpful to have dev-facing walkthroughs / technical documentation for relevant aspects of the codebase that aren't user-facing. On Thu, Jun 7, 2018, 1:23 PM Kenneth Knowles wrote: > Hi all, > > I've been in half a dozen conversations recently about whether to have a > wiki a

Re: Proposal for Beam Python User State and Timer APIs

2018-06-05 Thread Charles Chen
d as user > documentation. > > Thanks, > Thomas > > > On Wed, May 23, 2018 at 11:49 AM, Charles Chen wrote: > >> Thanks everyone for the detailed comments and discussions. It looks like >> by now, we mostly agree with the requirements and overall direction needed

Re: Existing transactionality inconsistency in the Beam Java State API

2018-06-05 Thread Charles Chen
gt;>>>> intuition being -- if we need to make an RPC to load one state value, we >>>>> are better off making an RPC to load all the values we need. >>>>> >>>>> Overall, I too lean towards maintaining the second interpretation >>>>&g

Re: [VOTE] Code Review Process

2018-06-01 Thread Charles Chen
+1 On Fri, Jun 1, 2018 at 11:20 AM Valentyn Tymofieiev wrote: > +1 > > On Fri, Jun 1, 2018 at 10:40 AM, Ahmet Altay wrote: > >> +1 >> >> On Fri, Jun 1, 2018 at 10:37 AM, Kenneth Knowles wrote: >> >>> +1 >>> >>> On Fri, Jun 1, 2018 at 10:25 AM Thomas Groh wrote: >>> As we seem to largely

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-05-31 Thread Charles Chen
Congratulations everyone! On Thu, May 31, 2018, 10:14 PM Pablo Estrada wrote: > Thanks to the PMC! Very humbled and excited to keep taking part in this > great community. > :) > -P. > > > On Thu, May 31, 2018, 10:10 PM Tim wrote: > >> Congratulations! >> >> >> Tim >> >> On 1 Jun 2018, at 07:05,

Re: Existing transactionality inconsistency in the Beam Java State API

2018-05-23 Thread Charles Chen
I have been promoting. There are some >> weaknesses: readLater() is pretty tightly coupled to a particular >> implementation style, and futures are decades old so you can get good APIs >> and performance without inventing anything. But I still like the non-future >> version a li

Existing transactionality inconsistency in the Beam Java State API

2018-05-23 Thread Charles Chen
During the design of the Beam Python State API, we noticed some transactionality inconsistencies in the existing Beam Java State API (these are the unresolved bugs BEAM-2980 and BEAM-2975 ). We are t

Re: Proposal for Beam Python User State and Timer APIs

2018-05-23 Thread Charles Chen
e a big > draw for Python, too. > > (commenting on the doc) > > Kenn > > On Mon, May 21, 2018 at 5:15 PM Charles Chen wrote: > >> I want to share a proposal for adding user state and timer support to the >> Beam Python SDK and get the community's thoughts

Proposal for Beam Python User State and Timer APIs

2018-05-21 Thread Charles Chen
I want to share a proposal for adding user state and timer support to the Beam Python SDK and get the community's thoughts on how such an API should look: https://s.apache.org/beam-python-user-state-and-timers Let me know what you think and please add any comments and suggestions you may have. Be

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-04 Thread Charles Chen
I have added https://issues.apache.org/jira/browse/BEAM-4236 as a blocker. On Fri, May 4, 2018 at 1:19 PM Ahmet Altay wrote: > Hi JB, > > We found an issue related to using side inputs in streaming mode using > python SDK. Charles is currently trying to find the root cause. Would you > be able t

Re: Pubsub on directrunner: direct_runner.py and transform_evaluator.py

2018-04-29 Thread Charles Chen
The write can be done as a normal ParDo / DoFn. The read needs to expose some watermark logic, which at the time of writing wasn't available, since no unbounded source API was available. We may be able to write the read / source as a SplittableDoFn since that API was introduced as an unbounded sour

Re: [VOTE] Release 2.4.0, release candidate #3

2018-03-19 Thread Charles Chen
+1. Verified the Python Quickstart on local and Dataflow (Mac / Linux). Also verified that the Mac / Linux wheels were built correctly with fast / compiled Cython coder support. On Mon, Mar 19, 2018 at 1:49 PM Robert Bradshaw wrote: > Thanks! > > BTW, in case anyone's wondering where the md5 fi

Re: [VOTE] Release 2.4.0, release candidate #2

2018-03-09 Thread Charles Chen
Thank you Valentyn for reporting this. I have traced the issue back to https://github.com/apache/beam/pull/4666, so I have sent out a PR to fix: https://github.com/apache/beam/pull/4846. On Fri, Mar 9, 2018 at 2:17 PM, Valentyn Tymofieiev wrote: > -1. > > Checked Python Quickstarts (Passed) and

Re: github reviews weirdness

2018-02-27 Thread Charles Chen
I noticed that GitHub sometimes has two "copies" of a comment thread--the first copy, the one that appears first on the page with the original commenter, is the only one that allows comments; a second "copy" is created when people do reviews. So maybe you can scroll up to find the right "copy" of

Re: Beam 2.4.0

2018-02-20 Thread Charles Chen
I would like to +1 the faster release cycle process JB and Robert have been advocating and implementing, and thank JB for releasing 2.3.0 smoothly. When we block for specific features and increase the time between releases, we increase the urgency for PR authors to push for their change to go into

Re: A 15x speed-up in local Python DirectRunner execution

2018-02-16 Thread Charles Chen
I hope those interested have had time to test this out. I have sent out https://github.com/apache/beam/pull/4696 to switch to using this fast runner as the default DirectRunner for local execution. Let me know if there are any concerns. On Tue, Feb 13, 2018 at 12:17 PM Charles Chen wrote

Re: A 15x speed-up in local Python DirectRunner execution

2018-02-13 Thread Charles Chen
7 AM Raghu Angadi wrote: >>> >>>> This is terrific news! Thanks Charles. >>>> >>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen wrote: >>>> >>>>> Local execution of Beam pipelines on the Python DirectRunner currently >>>

Proposal: build Python wheel distributions for Apache Beam releases

2018-02-12 Thread Charles Chen
Currently, Apache Beam distributes Python packages through pip and PyPI. On PyPI, developers can release either source tarballs, and / or precompiled "wheel" distributions for each platform, which would be used if available for a particular platform. Currently, we only distribute the source tarbal

Re: A 15x speed-up in local Python DirectRunner execution

2018-02-07 Thread Charles Chen
runner code afterwards? > Best > -P. > > On Wed, Feb 7, 2018, 6:25 PM Lukasz Cwik wrote: > >> That is pretty awesome. >> >> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen wrote: >> >>> Local execution of Beam pipelines on the Python DirectRunner current

A 15x speed-up in local Python DirectRunner execution

2018-02-07 Thread Charles Chen
Local execution of Beam pipelines on the Python DirectRunner currently suffers from performance issues, which makes it hard for pipeline authors to iterate, especially on medium to large size datasets. We would like to optimize and make this a better experience for Beam users. The FnApiRunner was

Re: Replacing Python DirectRunner apply_* hooks with PTransformOverrides

2018-02-02 Thread Charles Chen
eally part of the pipeline. I would this crisp > abstraction enforcement would add even more value to Python. > > Kenn > > On Thu, Feb 1, 2018 at 5:21 PM, Charles Chen wrote: > >> In the Python DirectRunner, we currently use apply_* overrides to >> override the opera

Replacing Python DirectRunner apply_* hooks with PTransformOverrides

2018-02-01 Thread Charles Chen
In the Python DirectRunner, we currently use apply_* overrides to override the operation of the default .expand() operation for certain transforms. For example, GroupByKey has a special implementation in the DirectRunner, so we use an apply_* override hook to replace the implementation of GroupByKe

Re: Removing the PValueCache from the Beam Python DirectRunner

2018-01-25 Thread Charles Chen
g them in?) > > On Thu, Jan 25, 2018 at 3:25 PM, Charles Chen wrote: > > Currently, the Python SDK supports an eager execution mode. For > example, a > > list can be directly passed into a PTransform to obtain its result: > > > > result = [1, 2, 3] | MyPTransform

Removing the PValueCache from the Beam Python DirectRunner

2018-01-25 Thread Charles Chen
Currently, the Python SDK supports an eager execution mode. For example, a list can be directly passed into a PTransform to obtain its result: result = [1, 2, 3] | MyPTransform() To support this use, the Python DirectRunner has an option to cache its intermediate results into a PValueCache. The

Re: Some interesting use case

2018-01-16 Thread Charles Chen
This sounds similar to the use case for tf.Transform, a library that depends on Beam: https://github.com/tensorflow/transform On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez wrote: > Hi, > I was wondering if anyone has encountered or used Beam in the following > manner: > > 1. During machine le

Re: [VOTE] Release 2.2.0, release candidate #3

2017-11-15 Thread Charles Chen
Could you send the command you used that produced this error? I can't reproduce it at the tip of the release-2.2.0 branch. On Wed, Nov 15, 2017 at 5:34 AM Reuven Lax wrote: > I'm trying to do the last CP and cut RC4, but I'm getting a compilation > failure in Python - "ImportError: No module na

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-10-31 Thread Charles Chen
As a contributor to the Beam Python SDK, I noticed that many of the points above regarding Maven and Gradle pertain mostly to Java SDK development. For Python development, Maven is much less natural, and we end up just shelling out to perform builds and tests. For Python SDK (and upcoming Go SDK d

Re: Problem while upgrading lib

2017-10-03 Thread Charles Chen
Please also use the requirement "pip install apache_beam[gcp]" to pull in appropriate Google Cloud dependencies, if needed. On Tue, Oct 3, 2017 at 11:47 AM Ahmet Altay wrote: > google-apitools dependency (which is required for GCS) does not work > with oauth2client >= 4.0.0 [1]. Because of this

Re: Proposal: Unbreak Beam Python 2.1.0 with 2.1.1 bugfix release

2017-09-19 Thread Charles Chen
bert Bradshaw > > > > wrote: > > > > > +1. Right now anyone who follows our quickstart instructions or > > > otherwise installs the latest release of apache_beam is broken. > > > > > > On Tue, Sep 19, 2017 at 2:05 PM, Charles Chen > > >

  1   2   >