Re: portableWordCountBatch and portableWordCountStreaming failing in Python PreCommit

2019-09-16 Thread Chad Dombrova
Ning, if you're having trouble making sense of the preCommit errors, you may be interested in this Jira: https://issues.apache.org/jira/browse/BEAM-8213# On Mon, Sep 16, 2019 at 12:02 PM Kyle Weaver wrote: > Python 2 isn't the reason the test is failing, that's just a warning. The > actual

The state of external transforms in Beam

2019-09-16 Thread Chad Dombrova
Hi all, There was some interest in this topic at the Beam Summit this week (btw, great job to everyone involved!), so I thought I’d try to summarize the current state of things. First, let me explain the idea behind an external transforms for the uninitiated. Problem: - there’s a transform

Re: Hackathon @BeamSummit @ApacheCon

2019-09-05 Thread Chad Dombrova
Has a date and time been picked for this? I'll be there for part of the week and would love to join. On Tue, Sep 3, 2019 at 11:31 AM Brian Hulette wrote: > I will be around all week as well and would love to help with a Beam > hackathon in any way :) > > On Thu, Aug 29, 2019 at 9:46 AM

Re: Need advice: PubsubIO external transform PR

2019-08-20 Thread Chad Dombrova
> The issue is also tracked here: > https://jira.apache.org/jira/browse/BEAM-7870 There are some suggestions > in the issue. I think the best solution is to allow execution of the > source API parts of KafkaIO/PubSubIO (on the Runner) and the following > UDFs (in the environment). Since those do

Re: Need advice: PubsubIO external transform PR

2019-08-19 Thread Chad Dombrova
> I don't understand why this replacement is necessary, since the next >>> transform in the chain is a java ParDo that seems like it should be fully >>> capable of using PubsubMessageWithAttributesCoder. >>> >> > Not too familiar with Flink, but have you tried using PubSub source from a > pure

Need advice: PubsubIO external transform PR

2019-08-17 Thread Chad Dombrova
Hi all, I've got a PR[1] for adding external transform support to PubsubIO so that it will work with python and go pipelines on Flink, and I am *so* close, but I've run into questions that the code cannot answer: I need a human now. The brief summary is

Re: [PROPOSAL] An initial Schema API in Python

2019-08-16 Thread Chad Dombrova
> > >> Agreed on float since it seems to trivially map to a double, but I’m > torn on int still. While I do want int type hints to work, it doesn’t seem > appropriate to map it to AtomicType.INT64, since it has a completely > different range of values. > >> > >> Let’s say we used native int for

Re: Python Beam pipelines on Flink on Kubernetes

2019-08-13 Thread Chad Dombrova
Hi Thomas, Nice work! It's really clearly presented. What's the current favored approach for pipeline submission? I'm also interested to know how this plan overlaps (if at all) with the work on Fine-Grained Resource Scheduling [1][2] that's being done for Flink 1.9+, which has implications for

Re: [PROPOSAL] An initial Schema API in Python

2019-08-03 Thread Chad Dombrova
Hi, This looks like a great feature. Is there a plan to eventually support custom field types? I assume adding support for dataclasses in python 3.7+ should be trivial to do in a follow up PR. Do you see any complications with that? The main advantage that dataclasses have over NamedTuple in

Re: Neat Beam integration - Ananas Analytics Desktop

2019-08-03 Thread Chad Dombrova
Very cool. Anyone know if it supports the portability framework, and thus python? I see a few indicators that the answer is likely “no”. On Fri, Jul 26, 2019 at 12:22 PM Kenneth Knowles wrote: > One colleague pointed me to this project and another tweeted about it [1]. > I see it was

Re: Docker Run Options in SDK Container

2019-08-02 Thread Chad Dombrova
Hi all, I’m a bit confused about the desire to use json for the environment_config. It’s harder to use json on the command line, such that now we’re talking about the value being *either* a docker image name *or* a path to a json file (OR maybe yaml too!), which is not only less convenient than

Re: [ANNOUNCE] Beam 2.14.0 Released!

2019-08-01 Thread Chad Dombrova
Nice work all round! I love the release blog format with the highlights and links to issues. -chad On Thu, Aug 1, 2019 at 4:23 PM Anton Kedin wrote: > The Apache Beam team is pleased to announce the release of version 2.14.0. > > Apache Beam is an open source unified programming model to

Re: [VOTE] Release 2.14.0, release candidate #1

2019-07-26 Thread Chad Dombrova
> > I think the release guide needs to be updated to remove the optionality of > blog creation and avoid confusion. Thanks for pointing that out. > +1

Re: [ANNOUNCE] Apache Beam 2.13.0 released!

2019-06-10 Thread Chad Dombrova
> > > @Chad Thanks for the feedback. I agree that we can improve our release > notes. The particular issue you were looking for was part of the detailed > list [1] linked in the blog post: > https://jira.apache.org/jira/browse/BEAM-7029 Just to be clear, I had no idea about the feature ahead of

Re: [ANNOUNCE] Apache Beam 2.13.0 released!

2019-06-07 Thread Chad Dombrova
I saw this and was particularly excited about the new support for "external" transforms in portable runners like python (i.e. the ability to use the Java KafkaIO transforms, with presumably more to come in the future). While the release notes are useful, I will say that it takes a lot of time and

<    1   2