Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-08 Thread Robert Bradshaw
Here's a script that we could run on the old and new sites that should quickly catch any major issues but not get caught up in formatting minutia. On Fri, May 8, 2020 at 10:23 AM Robert Bradshaw wrote: > On Fri, May 8, 2020 at 9:58 AM Aizhamal Nurmamat kyzy > wrote: > >&

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-08 Thread Robert Bradshaw
d >>> developer workflow. I see that Brian has at least looked at this some. >> >> >> My involvement so far was just recognizing a problem (creating root-owned >> files on jenkins workers) and helping to fix it. If there's anyone >> available who&

Re: Python2.7 Beam End-of-Life Date

2020-05-08 Thread Robert Bradshaw
map/ >> >> >> I made a minor change to update that page >> (https://github.com/apache/beam/pull/10848). A more comprehensive update to >> that page and linked >> (https://beam.apache.org/roadmap/python-sdk/#python-3-support) would still >> be welcome. &

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-07 Thread Robert Bradshaw
ble. Fixing after the initial merge might >>>> be a simpler task, especially if we can archive the old site. >>>> >>>>> >>>>> >>>>>> >>>>>> > Michal showed Nam how to handle the 1st test which was about Ap

Re: Python Static Typing: Next Steps

2020-05-06 Thread Robert Bradshaw
Just an update on this: we just merged https://github.com/apache/beam/pull/11620 which enforces typechecking for all files that currently pass. On Tue, Mar 3, 2020 at 1:12 PM Chad Dombrova wrote: >> >> This probably does not apply yet, does optional mean that opting-in for all >> or none of the

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-06 Thread Robert Bradshaw
Commit") — FAILURE > > 2) Are there any other blockers for merging? @Ahmet/Robert/others please > share if there are any other blockers. > > > [1] https://github.com/gohugoio/hugo/pull/4494 > > > On Wed, May 6, 2020 at 10:19 AM Robert Bradshaw wrote: >>

Re: Builtin IOs - Link to Java/Pydoc instead of code?

2020-05-06 Thread Robert Bradshaw
ability to *easily* toggle to a specific version.) > On Wed, May 6, 2020 at 11:06 AM Robert Bradshaw wrote: >> >> E.g. if I google "apache beam javadoc", the first link is for version >> 2.1.0 (no, that's not 2.21.0), the second for a (raw-looking) >> directory

Re: Builtin IOs - Link to Java/Pydoc instead of code?

2020-05-06 Thread Robert Bradshaw
E.g. if I google "apache beam javadoc", the first link is for version 2.1.0 (no, that's not 2.21.0), the second for a (raw-looking) directory index of javadocs by version, the third for Beam 2.5.0, and the fourth for a class in 2.2.0. On Wed, May 6, 2020 at 11:02 AM Robert

Re: Builtin IOs - Link to Java/Pydoc instead of code?

2020-05-06 Thread Robert Bradshaw
One problem is that there's no way to link to the "latest" version of the docs, or toggle between them. What links to docs we have are pointing to specific (often very old or out of date) docs. If we can solve this that'd make me more in favor of this proposal. On Wed, May 6, 2020 at 9:23 AM Kenne

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-06 Thread Robert Bradshaw
On Mon, May 4, 2020 at 7:07 PM Ahmet Altay wrote: > >> On Mon, May 4, 2020 at 6:30 PM Robert Bradshaw wrote: >>> >>> I took the massive commit and split it up into: >>> >>> (1) Infrastructure changes (basically everything outside of >>> (web

Re: [DISCUSS] finishBundle once per window

2020-05-05 Thread Robert Bradshaw
On Tue, May 5, 2020 at 3:08 PM Reuven Lax wrote: > > On Tue, May 5, 2020 at 2:58 PM Robert Bradshaw wrote: >> >> On Mon, May 4, 2020 at 11:08 AM Reuven Lax wrote: >> > >> > This should not affect the ability of the user to specify the output >> >

Re: [DISCUSS] finishBundle once per window

2020-05-05 Thread Robert Bradshaw
Lax wrote: >>> >>> This should not affect the ability of the user to specify the output >>> timestamp. Today FinishBundleContext.output forces you to specify the >>> window as well as the timestamp, which is a bit awkward. (I believe that it >>> also let

Re: Google Summer of Code 2020 [Accepted Proposal]

2020-05-05 Thread Robert Bradshaw
Congratulations and welcome! On Tue, May 5, 2020 at 12:46 PM Ismaël Mejía wrote: > > Hello Aldair, > You were added in JIRA as a contributor. > Welcome to the project! > > On Tue, May 5, 2020 at 8:59 PM Aldair Coronel Ruíz > wrote: > > > > Hi everyone! > > > > I'm Aldair. My GSoC 2020 project pr

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Robert Bradshaw
mmits/f9d8bc13a0fda0a60a436aa56186139d0f71de4e 228 files changed, 1859 insertions(+), 2370 deletions(-) I also separated out the compatibility matrix move, which was ~1700 lines. https://github.com/apache/beam/pull/11608/commits/16516d036af047493445654d61940dea8d04eaaa On Mon, May 4, 2020 at 6:15 PM Robert Bradshaw wrote: > > On

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-04 Thread Robert Bradshaw
on-sdk-now-public.html >>>>>>>>> become >>>>>>>>> https://beam.apache.org/blog/dataflow-python-sdk-is-now-public/. >>>>>>> >>>>>>> >>>>>>> I am not a content marketer. IMO, this is a go

Re: [DISCUSS] finishBundle once per window

2020-05-04 Thread Robert Bradshaw
This is a really nice idea. Would the user still need to specify the timestamp of the output? I'm a bit ambivalent about calling it multiple times if OuptutReceiver alone is in the parameter list; this might not be obvious and could be surprising behavior. On Mon, May 4, 2020 at 10:13 AM Reuven La

Re: Exploding windows and FnApiDoFnRunner

2020-05-04 Thread Robert Bradshaw
In Python we only explode windows if the Window is being inspected. (There is no separate "DoFnRunner" for FnApi vs. Legacy execution.) On Mon, May 4, 2020 at 9:21 AM Luke Cwik wrote: > > Reuven you are correct that the optimization has yet to be implemented. > Robert the FnApiDoFnRunner is the n

Re: Jenkins jobs not running for my PR 10438

2020-05-04 Thread Robert Bradshaw
Done. On Mon, May 4, 2020 at 7:35 AM Rehman Murad Ali wrote: > > Hi Beam committers, > > Would you please trigger the basic checks as well as validatesRunner check > for this PR? > https://github.com/apache/beam/pull/11350 > > > Thanks & Regards > > Rehman Murad Ali > Software Engineer > Mobile:

Re: JIRA priorities explaination

2020-05-01 Thread Robert Bradshaw
nstantsHelp.jspa?decorator=popup#PriorityLevels > [2] https://jira.atlassian.com/browse/JRASERVER-3821 > > On Fri, Oct 25, 2019 at 4:25 PM Pablo Estrada wrote: >> >> That SGTM >> >> On Fri, Oct 25, 2019 at 4:18 PM Robert Bradshaw wrote: >>> >>>

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-01 Thread Robert Bradshaw
gt;> I am not a content marketer. IMO, this is a good change. In the past, >>>>>> a few times, we edited dates on posts (e.g. a release date was entered >>>>>> incorrectly) and we had to either have a mismatch between dates in the >>>>>> url &

Re: Rethinking Python's PortableRunner default job server

2020-04-30 Thread Robert Bradshaw
30, 2020 at 10:11 AM Kyle Weaver > wrote: > >> > >> I'll bite :) Thanks for the feedback everyone! > >> > >> On Thu, Apr 30, 2020 at 1:01 PM Robert Bradshaw > wrote: > >>> > >>> I filed https://issues.apache.org/jira/browse/BEAM-9

Re: Rethinking Python's PortableRunner default job server

2020-04-30 Thread Robert Bradshaw
the user reported. > > On Wed, Apr 29, 2020 at 10:05 PM Robert Bradshaw > wrote: > > > > +1, I was actually thinking about this just the other day. > PortableRunner should require job_endpoint to be set, and we can have a > nice error message directing the explicit use o

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-04-29 Thread Robert Bradshaw
;m so sorry if I answer >>> a little bit due to the timezone. :) >>> >>> Best regards, >>> Nam >>> >>> >>> >>> On Tue, Apr 28, 2020 at 8:49 PM Aizhamal Nurmamat kyzy < >>> aizha...@apache.org> wrote: >>> >

Re: Rethinking Python's PortableRunner default job server

2020-04-29 Thread Robert Bradshaw
+1, I was actually thinking about this just the other day. PortableRunner should require job_endpoint to be set, and we can have a nice error message directing the explicit use of FlinkRunner for the old behavior. On Wed, Apr 29, 2020 at 11:50 AM Kyle Weaver wrote: > > Could the error message su

Re: Automation for Jira

2020-04-29 Thread Robert Bradshaw
+1 to more automation. I'm in favor of all but 4, I think it's quite common for issues to be noticed but not worked on for 60+ days. Most of the time when a developer files an issue they either (1) are working on it right now or (2) are filing it away because it's something they're not working on,

Re: sdks:java:container:generateThirdPartyLicenses effect on build time / stability

2020-04-28 Thread Robert Bradshaw
ad for both.) >> >> I guess I assumed there was some reason we needed "lightweight images" in >> our tests (because licenses take up a lot of space IIRC), but maybe not. >> Can you elaborate on the purpose of this option Hannah? >> >> On Tue, Apr 28, 2020

Re: Companies using Beam?

2020-04-28 Thread Robert Bradshaw
I think this is a great idea, as long as we can get critical mass. One danger I've seen is that such pages can grow stale/feel dated if not regularly updated/added to, so we should have a plan there. On Tue, Apr 28, 2020 at 4:21 PM Aizhamal Nurmamat kyzy wrote: > +1 on adding testimonials/snippe

Re: sdks:java:container:generateThirdPartyLicenses effect on build time / stability

2020-04-28 Thread Robert Bradshaw
;>>> I had marked the jira as a blocker for 2.21.0 because I was afraid >>>>>>> something was broken, but now it looks like the failures were just >>>>>>> flakes. >>>>>>> So BEAM-9764 <https://issues.apache.org/jira/brow

Re: How to submit PRs for dependant changes?

2020-04-28 Thread Robert Bradshaw
I prefer (c) as well, rebasing as things get merged. I would do (a) if they're really prerequisites for one another. On Tue, Apr 28, 2020 at 10:40 AM Udi Meiri wrote: > (a) or (c) should work. (c) is preferred if you want faster reviews. > > For multiple JIRAs, I've seen both [BEAM-123,BEAM-456]

Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-04-28 Thread Robert Bradshaw
Thanks. It'll be great to better support more languages. I looked at the PR and there seems to be no provenance/history. E.g. all the content seems to be entirely new files rather than diffs from the old. (There also seems to be a huge amount of auto-generated js code as well.) On Tue, Apr 28, 20

Re: [QUESTION] Reading Snappy Compressed Text Files

2020-04-22 Thread Robert Bradshaw
On Wed, Apr 22, 2020 at 11:06 AM Jeff Klukas wrote: > Beam is able to infer compression from file extensions for a variety of > formats, but snappy is not among them currently: > > > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java >

Re: Reference to Beam in upcoming Kubeflow Book

2020-04-17 Thread Robert Bradshaw
On Fri, Apr 17, 2020 at 4:58 PM Holden Karau wrote: > > On Fri, Apr 17, 2020 at 3:52 PM Robert Bradshaw > wrote: > >> On Fri, Apr 17, 2020 at 2:56 PM Holden Karau >> wrote: >> >>> >>> On Fri, Apr 17, 2020 at 2:45 PM Robert Bradshaw >>&g

Re: Reference to Beam in upcoming Kubeflow Book

2020-04-17 Thread Robert Bradshaw
On Fri, Apr 17, 2020 at 2:56 PM Holden Karau wrote: > > On Fri, Apr 17, 2020 at 2:45 PM Robert Bradshaw > wrote: > >> Hi Holden! >> >> I agree with Kyle that it makes sense to have some caveat about Flink and >> Spark, though at this point they're not /

Re: Reference to Beam in upcoming Kubeflow Book

2020-04-17 Thread Robert Bradshaw
Hi Holden! I agree with Kyle that it makes sense to have some caveat about Flink and Spark, though at this point they're not /that/ new (at least not Flink). I am curious what extra support Kubeflow is "missing" (or, conversely, what extra support it has for Dataflow that goes beyond just specify

Re: sdks:java:container:generateThirdPartyLicenses effect on build time / stability

2020-04-16 Thread Robert Bradshaw
create a Java docker image. >> >> The caching approach mentioned by Robert brings many benefits, not only >> to this use case. >> However, we would like to include this work as part of 2.21.0, so I will >> move with the multi processing approach this time. >&g

Re: sdks:java:container:generateThirdPartyLicenses effect on build time / stability

2020-04-15 Thread Robert Bradshaw
Is the cost primarily in pulling these remote licenses/sources? I'd guess that 99.9% of the URLs remain the same from run to run. Would a simple cache, or caching proxy, be sufficient? Otherwise, a tag to check that licenses can be pulled, but not really pull them, might be sufficient. (Making sur

Re: sdks:java:container:generateThirdPartyLicenses effect on build time / stability

2020-04-15 Thread Robert Bradshaw
In terms of pre-commit, 7-8 minutes seems worth not having to debug dependencies that broke this in post-commit. We should look at caching (IIRC we've long wanted to do this for pip and maven packages anyway). We could also consider whether, for development purposes, we could build "lite" container

Re: [DISCUSS] Let's establish a guideline for using Python type annotations in Beam codebase

2020-04-13 Thread Robert Bradshaw
On Mon, Apr 13, 2020 at 11:48 AM Valentyn Tymofieiev wrote: > > On Mon, Apr 13, 2020 at 10:53 AM Robert Bradshaw > wrote: > >> On Mon, Apr 13, 2020 at 10:38 AM Valentyn Tymofieiev >> wrote: >> >>> To clarify, I don't suggest that every variable

Re: [DISCUSS] Let's establish a guideline for using Python type annotations in Beam codebase

2020-04-13 Thread Robert Bradshaw
open to this. Let's get the type checkers enabled in presubmit and see what it takes to keep those happy before establishing more strict criterea. (It does sound like we have consensus on using type comments until 2.7 is dropped.) > On Fri, Apr 10, 2020 at 4:56 PM Robert Bradshaw > wrote:

Re: [DISCUSS] Let's establish a guideline for using Python type annotations in Beam codebase

2020-04-10 Thread Robert Bradshaw
; On Fri, Apr 10, 2020 at 1:46 PM Robert Bradshaw > wrote: > >> I prefer type-comments, as they can be validated by type checkers. Once >> we drop 2.7, we can go with actual type annotations (and the comments can >> be automatically converted over). >> >> On Fri

Re: [DISCUSS] Let's establish a guideline for using Python type annotations in Beam codebase

2020-04-10 Thread Robert Bradshaw
I prefer type-comments, as they can be validated by type checkers. Once we drop 2.7, we can go with actual type annotations (and the comments can be automatically converted over). On Fri, Apr 10, 2020 at 11:17 AM Valentyn Tymofieiev wrote: > I am seeing several styles we use to annotate non-pipe

Re: Usage metrics for Beam

2020-04-09 Thread Robert Bradshaw
the raw absolute number is tricky. You can probably > manage to see certain kinds of trends if you just look at relative numbers. > > Kenn > > On Thu, Apr 9, 2020 at 6:42 PM Austin Bennett > wrote: > >> @Robert Bradshaw , you sent that pypi link [1] >> the other da

Re: [VOTE] Release 2.20.0, release candidate #2

2020-04-09 Thread Robert Bradshaw
+1, the artifacts and signatures all look good, and I also checked that the Python wheels work with a simple pipeline in a fresh virtual environment. On Thu, Apr 9, 2020 at 5:11 PM Ahmet Altay wrote: > +1 - validated python quickstarts batch/streaming with python 2.7. > > Thank you Rui! > > On T

Re: Usage metrics for Beam

2020-04-09 Thread Robert Bradshaw
For Python, there's https://pypistats.org/packages/apache-beam . It's unclear how accurate these are, and how many of these downloads represent users vs. tools (e.g. setting up environments for continuous testing). On Thu, Apr 9, 2020 at 3:29 PM Griselda Cuevas wrote: > Hi folks - I'm interested

Re: [VOTE] Release 2.20.0, release candidate #1

2020-04-06 Thread Robert Bradshaw
with the binary. What undecided is if >>>>>>> missing that commit is -1, or that can be marked as a known issue in >>>>>>> release note. >>>>>>> >>>>>>> >>>>>>> -Rui >>>>>>>

Re: [VOTE] Release 2.20.0, release candidate #1

2020-04-06 Thread Robert Bradshaw
given that likely not in the binary artifacts either. On Mon, Apr 6, 2020 at 1:22 PM Rui Wang wrote: > I think PR#11252 is in the release branch? See > https://github.com/apache/beam/commits/release-2.20.0 (the top commit) > > > > -Rui > > On Mon, Apr 6, 2020 at 1:2

Re: [VOTE] Release 2.20.0, release candidate #1

2020-04-06 Thread Robert Bradshaw
Valentyn, do the container issues affect our external containers as well? I verified the signatures and sources, they all look good, except that we're missing https://github.com/apache/beam/pull/11252 if we were hoping to get that in. The wheel looks fine as well. On Mon, Apr 6, 2020 at 12:16 PM

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-04-03 Thread Robert Bradshaw
f that is still too long. The cost of >>> supporting a version may include: >>> - Developing against older Python version >>> - Release overhead (building & storing containers, wheels, doing >>> release validation) >>> - Complexity / development cost to

Re: Unportable Dataflow Pipeline Questions

2020-04-02 Thread Robert Bradshaw
ultiple steps (we'll have to keep updating features >>> such as a cross-language to be in lockstep which will be hard and result in >>> a lot of throwaway work). >>> >>> Thanks, >>> Cham >>> >>> >>>> On Tue, Mar 31, 2020 a

Re: Java PortabilityApi PreCommit failures

2020-04-01 Thread Robert Bradshaw
Alternatively we could roll back, but looks like disabling the tests has been merged. I'll take on fixing the container images (which looks like it will require getting the imports up to date). Thanks for tracking this down. On Wed, Apr 1, 2020 at 7:55 PM Luke Cwik wrote: > Been seeing these fai

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-04-01 Thread Robert Bradshaw
E.g. something like https://github.com/apache/beam/pull/11283 On Wed, Apr 1, 2020 at 2:57 PM Robert Bradshaw wrote: > On Wed, Apr 1, 2020 at 1:48 PM Sam Rohde wrote: > >> To restate the original issue it is that the current method of setting >> the output tags on PCollectio

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-04-01 Thread Robert Bradshaw
string -> PCollection, we use the keys as tags. We can extend this naturally to tuples, named tuples, nesting, etc. (though I don't know if there are any hidden assumptions left about having an output labeled None if we want to push this through to completion). > > > > On Wed

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-04-01 Thread Robert Bradshaw
pand: (...) -> Union[PValue, NamedTuple[str, PCollection], > Tuple[str, PCollection], Dict[str, PCollection], DoOutputsTuple] > > i.e. no arbitrary nesting when outputting from an expand > > On Tue, Mar 31, 2020 at 5:15 PM Robert Bradshaw > wrote: > >> On Tue, Mar 31, 20

Re: [VOTE + INPUT] Beam Mascot Designs, 3rd iteration - Deadline Wednesday, April 1

2020-04-01 Thread Robert Bradshaw
I prefer the no stripes version. On Wed, Apr 1, 2020 at 11:51 AM Pablo Estrada wrote: > Thanks for the update Julian! > I also like the evolution of the tail, and I like the new colors for the > wings. > I think stripes or no stripes should work fine. > > Thanks again! Excited to finalize the ma

Re: Default WindowFn for Unbounded source

2020-04-01 Thread Robert Bradshaw
On Wed, Apr 1, 2020 at 12:53 AM Jan Lukavský wrote: > Hi Amit, > > answers inline. > On 4/1/20 12:23 AM, amit kumar wrote: > > Thanks Ankur for your reply. > > By default the allowed lateness for a global window is zero but we can > also set it to be non-zero which will be used in the downstream

Re: Unportable Dataflow Pipeline Questions

2020-03-31 Thread Robert Bradshaw
On Tue, Mar 31, 2020 at 12:06 PM Sam Rohde wrote: > Hi All, > > I am currently investigating making the Python DataflowRunner to use a > portable pipeline representation so that we can eventually get rid of the > Pipeline(runner) weirdness. > > In that case, I have a lot questions about the Pytho

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Robert Bradshaw
d outputs of PCollections as maps rather than lists generally across the Python representations (which also relates to some of the ugliness that Cham has been running into with cross-language). > On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw wrote: >> >> On Tue, Mar 31, 2020

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Robert Bradshaw
On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde wrote: >>> >>> * Don't allow arbitrary nestings returned during expansion, force composite >>> transforms to always provide an unambiguous name (either a tuple with >>> PCollections with unique tags or a dictionary with untagged PCollections or >>> a si

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Robert Bradshaw
On Tue, Mar 24, 2020 at 1:07 PM Sam Rohde wrote: > > Hi All, > > Problem > I would like to discuss BEAM-9322 and the correct way to set the output tags > of a transform with nested PCollections, e.g. a dict of PCollections, a tuple > of dicts of PCollections. Before the fixing of BEAM-1833, the

Re: [PROPOSAL] Leveraging SQL TableProviders for Cross-Language IOs

2020-03-30 Thread Robert Bradshaw
A belated but very enthusiastic +1 to this proposal. Added some comments to the doc. On Thu, Jan 16, 2020 at 9:05 AM Kenneth Knowles wrote: > > Nice! This is quite clever. > > Kenn > > On Mon, Jan 13, 2020 at 5:08 PM Chamikara Jayalath > wrote: >> >> Thanks Brian. Added some comments. >> >> On

Re: Next LTS?

2020-03-24 Thread Robert Bradshaw
e demand >> for LTS releases. >> >> There was a suggestion to mark the last release with python 2 support to be >> an LTS release, was there a conclusion on that? ( +Valentyn Tymofieiev ) >> >> Ahmet >> >> On Tue, Mar 24, 2020 at 2:34 PM Robert Brad

Re: [PROPOSAL] Add licenses and notices to SDK docker images

2020-03-24 Thread Robert Bradshaw
Thank you for updating the doc. As I mentioned on the PR, I do not think we should check all 100K lines of auto-generated/pulled licence files into the repository and run separate asynchronous processes to try to keep things in sync and fix things up as dependencies evolve. Instead, we should popul

Re: Next LTS?

2020-03-24 Thread Robert Bradshaw
currently) for LTS to exist. >>> Though, worth ensuring we live up to what we keep on the website. And, >>> without an active LTS, probably something we should take off the site? >>> >>> On Thu, Sep 19, 2019 at 1:33 PM Pablo Estrada wrote: >>>> &g

Re: Special characters in Beam Schema field names

2020-03-19 Thread Robert Bradshaw
icks will make it totally clear that the dot is not > a field separator. If we're generating *new* field names, I'd just as soon a convention that generates non-special ones just for ease of use. > On Wed, Mar 18, 2020 at 5:09 PM Robert Bradshaw wrote: >> >> Give the f

Re: Special characters in Beam Schema field names

2020-03-18 Thread Robert Bradshaw
Give the flexibility of SQL, and the diversity of upstream systems, I'd lean on the side of being maximally flexible and saying a field name is a utf-8 string (including whitespace?), but special characters may require quoting and/or not allow some convenience (e.g. POJO creation). On Wed, Mar 18,

Re: Contributing Twister2 runner to Apache Beam

2020-03-05 Thread Robert Bradshaw
I think we will get to a point where it makes sense for runners to live in their own repositories, with their own release cadence, but we're not at that point yet. One prerequisite is a stable API--we're closing in on that with the portability protos, but many (java) runners actually share the comm

Re: Run Python PreCommit break?

2020-03-05 Thread Robert Bradshaw
https://github.com/apache/beam/pull/11021 for getting rid of these vestigal error logs. On Thu, Mar 5, 2020 at 1:21 PM Rui Wang wrote: > > Hi Community, > > Is python precommit breaking? I have observed a consistent test case failure > from > apache_beam.runners.portability.portable_runner_test

Re: Python Static Typing: Next Steps

2020-03-03 Thread Robert Bradshaw
er could be a good occasion to rework the current PythonLint >> > job. Since yapf has been introduced, some of the checks made by >> > pylint/flake are now unnecessary and could be dismantled. This would >> > speed-up PythonLint quite a lot. >> > I volunteer

Re: Java SplittableDoFn Watermark API

2020-03-03 Thread Robert Bradshaw
where *all* runners become portable runners. The at doesn't mean they all need to user docker images, or even GRPC, but I don't think having classical-only or classical-excluded features is where we want to be long-term. > On Tue, Mar 3, 2020 at 1:41 AM Robert Bradshaw wrote: > &g

Re: Error logging from fn_api_runners

2020-03-02 Thread Robert Bradshaw
Yeah, this was an oversight on my part. I don't think we need to log this at all. https://github.com/apache/beam/pull/11021 for anyone to look at. On Mon, Mar 2, 2020 at 2:44 PM Heejong Lee wrote: > > I think it should be either info or debug but not error. > > On Mon, Mar 2, 2020 at 2:35 PM Ning

Re: Python Static Typing: Next Steps

2020-03-02 Thread Robert Bradshaw
It seems people are conflating git pre-commit hooks (which IMHO should ideally be in the sub-second range, and run when an author does "git commit") with jenkins pre-commit tests (for which minutes is nothing compared to what we already do). I am +1 to adding mypy to the latter for sure, and think

Re: Java SplittableDoFn Watermark API

2020-03-02 Thread Robert Bradshaw
I don't have a strong preference for using a provider/having a set of tightly coupled methods in Java, other than that we be consistent (and we already use the methods style for restrictions). On Mon, Mar 2, 2020 at 3:32 PM Luke Cwik wrote: > > Jan, there are some parts of Apache Beam the waterma

Re: Python Static Typing: Next Steps

2020-03-02 Thread Robert Bradshaw
+1 We should enable this on jenkins, plus trivial instructions (ideally a one-liner tox command) to run it locally. Hopefully the errors will be easy enough for contributors to figure out (in particular local to and commensurate in complexity with the code that they're editing), and I agree it's t

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-02-26 Thread Robert Bradshaw
2020 at 5:00 PM Ruoyun Huang wrote: >>>> >>>> I feel 4+ versions take too long to run anything. >>>> >>>> would vote for lowest + highest, 2 versions. >>>> >>>> On Wed, Feb 26, 2020 at 4:52 PM Udi Meiri wrote: >>

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-02-26 Thread Robert Bradshaw
versions take too long to run anything. >> >> would vote for lowest + highest, 2 versions. >> >> On Wed, Feb 26, 2020 at 4:52 PM Udi Meiri wrote: >>> >>> I agree with having low-frequency tests for low-priority versions. >>> Low-priority versio

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-02-26 Thread Robert Bradshaw
d highest version, and can get by with smoke tests + infrequent post-commits for the ones between. > Kenn > > On Wed, Feb 26, 2020 at 3:25 PM Robert Bradshaw wrote: >> >> +1 to consulting users. Currently 3.5 downloads sit at 3.7%, or about >> 20% of all Python 3 downloa

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-02-26 Thread Robert Bradshaw
+1 to consulting users. Currently 3.5 downloads sit at 3.7%, or about 20% of all Python 3 downloads. I would propose getting in warnings about 3.5 EoL well ahead of time, at the very least as part of the 2.7 warning. Fortunately, supporting multiple 3.x versions is significantly easier than spann

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-02-26 Thread Robert Bradshaw
Thanks for bringing this up. I've actually been thinking about the same thing (specifically with regards to 3.5 and 3.8). I think it would makes sense to add support for 3.8 right away (or at least get a good sense of what work needs to be done and what our dependency situation is like), and to dr

Re: [VOTE] Vendored Dependencies Release Byte Buddy 1.10.8 RC2

2020-02-26 Thread Robert Bradshaw
+1 (binding) On Wed, Feb 26, 2020 at 1:11 PM Pablo Estrada wrote: > > +1 (binding) > Verified hashes. > Thank you Ismael! > > On Wed, Feb 26, 2020 at 11:30 AM Luke Cwik wrote: >> >> +1 (binding) >> >> Verified signatures and contents of jar to not contain module-info.class >> >> On Wed, Feb 26,

Re: python Multiprocessing started with in a do function

2020-02-26 Thread Robert Bradshaw
I suspect this may be due to long-standing bugs regarding forking a process that has grpc channels. See, e.g. https://github.com/grpc/grpc/issues/18321 On Wed, Feb 26, 2020 at 9:02 AM laxman reddy wrote: > > Hello Team, > i am using beam for experimenting for my project usecase >

Re: [DISCUSSION] Use github actions for python wheels ?

2020-02-25 Thread Robert Bradshaw
I'd be in favor of this, assuming it actually simplifies things. (Note that the wheels are for several variants of linux, presumably we could do cross-compiles. Also, manylinux is a "minimal" linux specifically built as to produce shared object libraries compatible with a wide variety of distributi

Re: [ANNOUNCE] New committer: Chad Dombrova

2020-02-24 Thread Robert Bradshaw
Well deserved, Chad. Congratulations! On Mon, Feb 24, 2020 at 2:43 PM Reza Rokni wrote: > > Congratulations! :-) > > On Tue, Feb 25, 2020 at 6:41 AM Chad Dombrova wrote: >> >> Thanks, folks! I'm very excited to "retest this" :) >> >> Especially big thanks to Robert and Udi for all their hard wo

Re: [VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.2 for BEAM-9252

2020-02-21 Thread Robert Bradshaw
+1 (binding) On Fri, Feb 21, 2020 at 4:48 PM Ahmet Altay wrote: > > +1 > > On Fri, Feb 21, 2020 at 4:39 PM Luke Cwik wrote: >> >> +1 (binding) >> I diffed the binary contents of the 0.1 jar and 0.2 jar with no changes to >> the contents of the files and can confirm that module-info.class the

Re: FnAPI proto backwards compatibility

2020-02-20 Thread Robert Bradshaw
> Jan > > On 2/13/20 8:42 PM, Robert Burke wrote: > > +1 to deferring for now. Since they should not be modified after adoption, it > makes sense not to get ahead of ourselves. > > On Thu, Feb 13, 2020, 10:59 AM Robert Bradshaw wrote: >> >> On Thu, Feb 13, 20

Re: Cross-language pipelines status

2020-02-19 Thread Robert Bradshaw
gt; -chad > > > On Wed, Feb 19, 2020 at 6:00 PM Robert Bradshaw wrote: >> >> Hopefully this should be resovled by >> https://issues.apache.org/jira/browse/BEAM-9229 >> >> On Wed, Feb 19, 2020 at 5:52 PM Chad Dombrova wrote: >> > >> > We are

Re: Cross-language pipelines status

2020-02-19 Thread Robert Bradshaw
Hopefully this should be resovled by https://issues.apache.org/jira/browse/BEAM-9229 On Wed, Feb 19, 2020 at 5:52 PM Chad Dombrova wrote: > > We are using external transforms to get access to PubSubIO within python. It > works well, but there is one major issue remaining to fix: we have to bui

Re: FnAPI proto backwards compatibility

2020-02-14 Thread Robert Bradshaw
Oh, sorry. Try it again https://docs.google.com/document/d/1CyVElQDYHBRfXu6k1VSXv3Yok_4r8c4V0bkh2nFAWYc/edit?usp=sharing On Fri, Feb 14, 2020 at 2:04 PM Jan Lukavský wrote: > > Hi Robert, > > the doc seems to be locked. > > Jan > > On 2/14/20 10:56 PM, Robert Bradshaw wr

Re: FnAPI proto backwards compatibility

2020-02-14 Thread Robert Bradshaw
; of the pipeline. >>> >>> Kenn >>> >>>> >>>> c) we can take advantage of these pipeline features to get rid of the >>>> categories of @ValidatesRunner tests, because we could have just simply >>>> @ValidatesRunner and ea

Re: FnAPI proto backwards compatibility

2020-02-13 Thread Robert Bradshaw
a pipeline that the > SDK can understand" (eg. Combiner lifting, and state backed iterables), as > well as "what the pipeline requires from the runner" and "what the runner is > able to do" (eg. Requires sorted input) > > > On Thu, Feb 13, 2020, 9:06 AM

Re: [PROPOSAL] Transition released containers to the official ASF dockerhub organization

2020-02-13 Thread Robert Bradshaw
>> >>> +1 very nice explanation >>> >>> On Wed, Jan 15, 2020 at 1:57 PM Ahmet Altay wrote: >>>> >>>> +1 - Thank you for driving this! >>>> >>>> On Wed, Jan 15, 2020 at 1:55 PM Thomas Weise wrote: >&g

Re: FnAPI proto backwards compatibility

2020-02-12 Thread Robert Bradshaw
On Wed, Feb 12, 2020 at 11:08 AM Luke Cwik wrote: > > We can always detect on the runner/SDK side whether there is an unknown > field[1] within a payload and fail to process it but this is painful in two > situations: > 1) It doesn't provide for a good error message since you can't say what the

Re: FnAPI proto backwards compatibility

2020-02-12 Thread Robert Bradshaw
On Tue, Feb 11, 2020 at 7:25 PM Kenneth Knowles wrote: > > On Tue, Feb 11, 2020 at 8:38 AM Robert Bradshaw wrote: >> >> On Mon, Feb 10, 2020 at 7:35 PM Kenneth Knowles wrote: >> > >> > On the runner requirements side: if you have such a list at the pipeline

Re: FnAPI proto backwards compatibility

2020-02-11 Thread Robert Bradshaw
t;X" Pablo described here [1]. >> >> Brian >> >> [1] >> https://lists.apache.org/thread.html/e93ac64d484551d61e559e1ba0cf4a15b760e69d74c5b1d0549ff74f%40%3Cdev.beam.apache.org%3E >> >> On Mon, Feb 10, 2020 at 3:55 PM Robert Bradshaw wrote: >>&g

Re: Labels on PR

2020-02-11 Thread Robert Bradshaw
+1 to finding the right balance. I do think per-runner makes sense, rather than a general "runners." IOs might make sense as well. Not sure about all the extensions-* I'd leave those out for now. On Tue, Feb 11, 2020 at 5:56 AM Ismaël Mejía wrote: > > > So I propose going simple with a limited s

FnAPI proto backwards compatibility

2020-02-10 Thread Robert Bradshaw
With an eye towards cross-language (which includes cross-version) pipelines and services (specifically looking at Dataflow) supporting portable pipelines, there's been a desire to stabilize the portability protos. There are currently many cleanups we'd like to do [1] (some essential, others nice to

Re: Retest this please access?

2020-02-10 Thread Robert Bradshaw
We're working on that, follow https://issues.apache.org/jira/browse/INFRA-19670 On Mon, Feb 10, 2020 at 9:52 AM Daniel Collins wrote: > > Hello all, > > I'm feeling a bit bad about asking my reviewers to re-run presubmits. How > would I go about getting access to "Retest this please" being inter

Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-07 Thread Robert Bradshaw
There are two separable concerns here. (1) The @RequiresTimeSortedInput feature itself. This is a subtle feature needed for certain pipelines, and if anything Jan has gone the extra mile discussing, documenting, and designing this and trying to reach consensus. I feel like there has been a failure

Re: [DISCUSS] Autoformat python code with Black

2020-02-07 Thread Robert Bradshaw
rk_to( > +11).advance_processing_time(1).advance_watermark_to( > +12).advance_processing_time(1).advance_watermark_to( > +13).advance_processing_time(1).advance_watermark_to( > + > 14).advance_processing_time(

Re: Time precision in Python

2020-02-07 Thread Robert Bradshaw
as just surprised by the precision loss. Thanks! >> >> On Thu, Feb 6, 2020 at 1:50 PM Robert Bradshaw wrote: >>> >>> Yes, the inconsistency of timestamp granularity is something that >>> hasn't yet been resolved (see previous messages on this list). A

Re: [DISCUSS] Autoformat python code with Black

2020-02-06 Thread Robert Bradshaw
n. In the Java >>>> case >>>> what we did was to just notice every PR that was affected by the change. >>>> And clearly document how to validate and autoformat the code. >>>> >>>> So the earlier the better, go go autoformat! >>>>

<    1   2   3   4   5   6   7   8   9   10   >