Re: Artifact staging in cross-language pipelines

2019-04-23 Thread Robert Bradshaw
I've been out, so coming a bit late to the discussion, but here's my thoughts. The expansion service absolutely needs to be able to provide the dependencies for the transform(s) it expands. It seems the default, foolproof way of doing this is via the environment, which can be a docker image with

Re: [docs] Python State & Timers

2019-04-11 Thread Robert Bradshaw
That's a great idea! It would probably be pretty easy to add the corresponding code snippets to the docs as well. On Thu, Apr 11, 2019 at 2:00 PM Maximilian Michels wrote: > > Hi everyone, > > The Python SDK still lacks documentation on state and timers. > > As a first step, what do you think

Re: [ANNOUNCE] New committer announcement: Boyuan Zhang

2019-04-11 Thread Robert Bradshaw
Congratulations! On Thu, Apr 11, 2019 at 12:29 PM Michael Luckey wrote: > > Congrats and welcome, Boyuan > > On Thu, Apr 11, 2019 at 12:27 PM Tim Robertson > wrote: >> >> Many congratulations Boyuan! >> >> On Thu, Apr 11, 2019 at 10:50 AM Łukasz Gajowy wrote: >>> >>> Congrats Boyuan! :) >>>

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-08 Thread Robert Bradshaw
On Mon, Apr 8, 2019 at 8:04 PM Kenneth Knowles wrote: > > On Mon, Apr 8, 2019 at 1:57 AM Robert Bradshaw wrote: >> >> On Sat, Apr 6, 2019 at 12:08 AM Kenneth Knowles wrote: >> > >> > On Fri, Apr 5, 2019 at 2:24 PM Robert Bradshaw wrote: >> >&

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-08 Thread Robert Bradshaw
On Sat, Apr 6, 2019 at 12:08 AM Kenneth Knowles wrote: > > > > On Fri, Apr 5, 2019 at 2:24 PM Robert Bradshaw wrote: >> >> On Fri, Apr 5, 2019 at 6:24 PM Kenneth Knowles wrote: >> > >> > Nested and unnested contexts are two different encodings. Can we j

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-05 Thread Robert Bradshaw
StrUtf8Coder, ...], LengthPrefixCoder[StrUtf8Coder], and using StringUtf8Coder for IO. > > On Fri, Apr 5, 2019 at 12:38 AM Robert Bradshaw wrote: >> >> On Fri, Apr 5, 2019 at 12:50 AM Heejong Lee wrote: >> > >> > Robert, does nested/unnested context work prop

Re: [DISCUSS] Backwards compatibility of @Experimental features

2019-04-05 Thread Robert Bradshaw
if it's technically feasible, I am also in favor of requiring experimental features to be (per-tag, Python should be updated) opt-in only. We should probably regularly audit the set of experimental features we ship (I'd say as part of the release, but that process is laborious enough, perhaps we

Re: [PROPOSAL] commit granularity in master

2019-04-05 Thread Robert Bradshaw
> I like the extra delimitation the brackets give, worth the two extra > > characters to me. More importantly, it's nice to have consistency, and > > the only way to be consistent with the past is leave them there. > > My point with the brackets is that we are 'getting close

Re: Hazelcast Jet Runner - validation tests

2019-04-05 Thread Robert Bradshaw
On Thu, Apr 4, 2019 at 6:38 PM Lukasz Cwik wrote: > > The issue with unbounded tests that rely on triggers/late data/early > firings/processing time is that these are several sources of non-determinism. > The sources make non-deterministic decisions around when to produce data, > checkpoint,

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-05 Thread Robert Bradshaw
ob/release-2.12.0/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/LengthPrefixCoder.java#L64 (and I'm sure there's others). > On Thu, Apr 4, 2019 at 3:25 PM Robert Bradshaw wrote: >> >> I don't know why there are two separate copies of >> standard_coders.yaml--orig

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Robert Bradshaw
>>>>>> the default type->coder mapping, the well known coder will gain little >>>>>> usage. I think we should fix the Python coder to use the same encoding >>>>>> as Java for UTF-8 strings before there are too many Python SDK users. >>

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Robert Bradshaw
A URN defines the encoding. There are (unfortunately) *two* encodings defined for a Coder (defined by a URN), the nested and the unnested one. IIRC, in both Java and Python, the nested one prefixes with a var-int length, and the unnested one does not. We should define the spec clearly and have

Re: Deprecating Avro for fastavro on Python 3

2019-04-02 Thread Robert Bradshaw
I agree with Ahmet. Fastavro seems to be well maintained and has good, tested compatibility. Unless we expect significant performance improvements in the standard Avro Python package (a significant undertaking, likely not one we have the bandwidth to take on, and my impression is that it's

Re: [PROPOSAL] Standardize Gradle structure in Python SDK

2019-03-29 Thread Robert Bradshaw
On Fri, Mar 29, 2019 at 12:54 PM Michael Luckey wrote: > > Really like the idea of improving here. > > Unfortunately, I haven't worked with python on that scale yet, so bear with > my naive understandings in this regard. If I understand correctly, the > suggestion will result in a couple of

Re: Python SDK Arrow Integrations

2019-03-29 Thread Robert Bradshaw
First off, huge +1 to a good integration with Arrow and Beam. I think to fully realize the benefits we need to have deeper integration than arrow-frame-batches as elements, i.e. SDKs should be augmented to understand arrow frames as batches of individual elements, each with (possibly) their own

Re: [spark runner dataset POC] workCount works !

2019-03-22 Thread Robert Bradshaw
Nice! Between this and the portability work (https://github.com/apache/beam/pull/8115), hopefully we'll have a modern Spark runner soon. Any idea on how hard (or easy?) it will be to merge those two? On Fri, Mar 22, 2019 at 9:29 AM Łukasz Gajowy wrote: > > Cool. :) Congrats and thank you for

Re: [PROPOSAL] commit granularity in master

2019-03-22 Thread Robert Bradshaw
On Fri, Mar 22, 2019 at 3:02 PM Ismaël Mejía wrote: > > It is good to remind committers of their responsability on the > 'cleanliness' of the merged code. Github sadly does not have an easy > interface to do this and this should be done manually in many cases, > sadly I have seen many committers

Re: What quick command to catch common issues before pushing a python PR?

2019-03-20 Thread Robert Bradshaw
I use tox as well. Actually, I use detox and retox (parallel versions of tox, easily installable with pip) which can speed things up quite a bit. On Wed, Mar 20, 2019 at 1:33 AM Pablo Estrada wrote: > > Correction - the command is now: tox -e py35-gcp,py35-lint > > And it ran on my machine in

Re: [PROPOSAL] Preparing for Beam 2.12.0 release

2019-03-18 Thread Robert Bradshaw
I agree with Kenn on both accounts. We can (and should) keep 2.7.x alive with an immanent 2.7.1 release, and choose the next one at a future date based on actual experience with an existing release. On Fri, Mar 15, 2019 at 5:36 PM Ahmet Altay wrote: > > +1 to extending 2.7.x LTS lifetime for a

Re: Cross-language transform API

2019-03-11 Thread Robert Bradshaw
On Mon, Mar 11, 2019 at 4:37 PM Maximilian Michels wrote: > > > Just to clarify. What's the reason for including a PROPERTIES enum here > > instead of directly making beam_urn a field of ExternalTransformPayload ? > > The URN is supposed to be static. We always use the same URN for this > type

Re: Python precommit duration is above 1hr

2019-03-09 Thread Robert Bradshaw
Perhaps this is the duplication of all (or at least most) previously existing tests for running under Python 3. I agree that this is excessive; we should probably split out Py2, Py3, and the linters into separate targets. We could look into using detox or retox to parallelize the testing as

Re: Signing artefacts during release

2019-03-08 Thread Robert Bradshaw
On Fri, Mar 8, 2019 at 2:42 AM Ahmet Altay wrote: > > This sounds good to me. > > On Thu, Mar 7, 2019 at 3:32 PM Michael Luckey wrote: >> >> Thanks for your comments. >> >> So to continue here, I ll prepare a PR implementing C: >> >> Pass the sign key to the relevant scripts and use that for

Re: Signing artefacts during release

2019-03-06 Thread Robert Bradshaw
I would not be opposed to make the choice of signing key a required argument for the relevant release script(s). On Wed, Mar 6, 2019 at 3:44 PM Michael Luckey wrote: > > Hi, > > After upgrade to gradle 5 @altay (volunteering/selected as release manager) > was hit by an issue [1] which prevented

Re: [VOTE] Release 2.11.0, release candidate #2

2019-03-04 Thread Robert Bradshaw
I see the vote has passed, but +1 (binding) from me as well. On Mon, Mar 4, 2019 at 11:51 AM Jean-Baptiste Onofré wrote: > > +1 (binding) > > Tested with beam-samples. > > Regards > JB > > On 26/02/2019 10:40, Ahmet Altay wrote: > > Hi everyone, > > > > Please review and vote on the release

Re: [VOTE] Release 2.11.0, release candidate #1

2019-02-22 Thread Robert Bradshaw
>> >> Done, thank you for the pointer. >> >>> >>> >>> "Once all python wheels have been staged dist.apache.org, please run >>> ./sign_hash_python_wheels.sh to sign and hash python wheels." >>> >>> On Fri, Feb 22,

Re: [VOTE] Release 2.11.0, release candidate #1

2019-02-22 Thread Robert Bradshaw
It looks like https://github.com/apache/beam/blob/release-2.11.0/build.gradle differs from the copy in the release source tarball (line 22, and some whitespace below). Other than that, the artifacts and signatures look good. On Fri, Feb 22, 2019 at 9:50 AM Ahmet Altay wrote: > > Hi everyone, > >

Re: Hazelcast Jet Runner

2019-02-15 Thread Robert Bradshaw
On Fri, Feb 15, 2019 at 7:36 AM Can Gencer wrote: > > We at Hazelcast are looking into writing a Beam runner for Hazelcast Jet > (https://github.com/hazelcast/hazelcast-jet). I wanted to introduce myself as > we'll likely have questions as we start development. Welcome! Hazelcast looks

Re: Thoughts on a reference runner to invest in?

2019-02-14 Thread Robert Bradshaw
nner per language would be nice but if we must > choose only one language I prefer it to be Java just because we have a > bigger community that can contribute and improve it. We may work on > making the distribution of such runner more easier or friendly for > users of different language

Re: Is there a reason why these are error logs? Missing required coder_id on grpc_port

2019-02-13 Thread Robert Bradshaw
We should fix the offending runner(s?). I think this is BEAM-4150. On Wed, Feb 13, 2019 at 2:47 AM Alex Amato wrote: > These errors are very spammy in certain jobs, I was wondering if we could > reduce the log level. Or put some conditions around this? > > >

Bintray account

2019-02-13 Thread Robert Bradshaw
I've been looking at updating our release scripts to resolve https://issues.apache.org/jira/browse/BEAM-6544 and have a setup that pushes to bintray (and then the release script downloads and signs them before pushing to svn). Does anyone know if we already have an apache beam organization already

Re: Thoughts on a reference runner to invest in?

2019-02-13 Thread Robert Bradshaw
uthors than a ReferenceRunner which was designed for single >>>> node testing. >>>> >>>> I think there are three parts which help to push forward portability: >>>> >>>> 1) Good library support for new portable Runners (Java) >>>>

Re: Thoughts on a reference runner to invest in?

2019-02-12 Thread Robert Bradshaw
This is certainly an interesting question, and I definitely have my opinions, but am curious as to what others think as well. One thing that I think wasn't as clear from the outset is distinguishing between the development of runners/core-java and development of a Java reference runner itself.

Re: pipeline steps

2019-02-11 Thread Robert Bradshaw
In terms of performance, it would likely be minimal overhead if (as is likely) the step consuming the filename gets fused with the read. There's still overhead constructing this composite, object, etc. but that's (again likely) smaller than the cost of doing the read itself. On Sun, Feb 10, 2019

Re: [VOTE] Release 2.10.0, release candidate #3

2019-02-08 Thread Robert Bradshaw
+1 (binding) I have verified that the artifacts and their checksums/signatures look good, and also checked the Python wheels against simple pipelines. On Fri, Feb 8, 2019 at 4:29 PM Etienne Chauchot wrote: > Hi, > I did the same visual checks of Nexmark that I did on RC2 for both > functional

Re: 2.7.1 (LTS) release?

2019-02-08 Thread Robert Bradshaw
+1, I've always found it odd that our build process creates and then reverts commits in the branch (and had the same issue when I was doing the release that restarting if something went wrong was painful). If I understand correctly, a, b, and c would be tags in the github repository, but not live

Re: [VOTE] Release 2.10.0, release candidate #2

2019-02-06 Thread Robert Bradshaw
+1. I verified the source artifacts look good, and tried the Python wheels. On Tue, Feb 5, 2019 at 11:57 PM Kenneth Knowles wrote: > > Hi everyone, > > Please review and vote on the release candidate #2 for the version 2.10.0, as > follows: > > [ ] +1, Approve the release > [ ] -1, Do not

Re: Beam Python streaming pipeline on Flink Runner

2019-02-05 Thread Robert Bradshaw
t of parameter definitions > > so that the target SDK (for example, Java) can call back the original > > SDK where the pipeline was authored in (for example, Python) to resolve > > UDFs at runtime. > > > > Thanks, > > Cham > > > > That'

Re: [Proposal] Get Metrics API: Metric Extraction via proto RPC API.

2019-02-04 Thread Robert Bradshaw
To summarize for the list, the plan of record is: The MonitoringInfo proto will be used again in this querying API, so the metric format SDKs report will also be used when extracting metrics for a job. // Job Service for running RunnerAPI pipelines service JobService { ... rpc GetJobMetrics

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Robert Bradshaw
; > non-splittable DoFn): >> >- ClickhouseIO >> >- File-based ones: TextIO, AvroIO, ParquetIO >> >- JdbcIO >> >- SolrIO >> > >> > Max thanks for your summary. I would like to add that we agree that >> > the runner specific

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Robert Bradshaw
ut would like to > hear your thoughts. > > -Max > > On 01.02.19 13:08, Robert Bradshaw wrote: > > On Thu, Jan 31, 2019 at 6:25 PM Maximilian Michels > <mailto:m...@apache.org>> wrote: > > > > Ah, I thought you meant native Flink transforms. > >

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Robert Bradshaw
On Thu, Jan 31, 2019 at 6:25 PM Maximilian Michels wrote: > Ah, I thought you meant native Flink transforms. > > Exactly! The translation code is already there. The main challenge is how > to > programmatically configure the BeamIO from Python. I suppose that is also > an > unsolved problem for

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Robert Bradshaw
Yes, this is precisely the goal of SDF. On Wed, Jan 30, 2019 at 8:41 PM Kenneth Knowles wrote: > > So is the latter is intended for splittable DoFn but not yet using it? The > promise of SDF is precisely this composability, isn't it? > > Kenn > > On Wed, Jan 30, 2019 at 10:16 AM Jeff Klukas

Re: Portable metrics work and open questions

2019-01-30 Thread Robert Bradshaw
ture. But there there should be a way to pass > through metrics so that they can be queried out. I think that is missing from > the doc right now. I'll iterate on that a bit. For sum_int64 and > distribution_int_64 this will be possible. but we should document the > translation formal &

Re: Portable metrics work and open questions

2019-01-30 Thread Robert Bradshaw
Thanks for writing this up. I left some comments in the doc, but at a high level I am in favor of the "more deeply overhaul SDKs' metrics/querying structures to use MonitoringInfos / URNs" option, at least over the Jobs API, for consistency and completeness. The SDK can provide whatever

Re: [VOTE] Release 2.10.0, release candidate #1

2019-01-29 Thread Robert Bradshaw
The artifacts and signatures look good. But we're missing Python wheels. On Tue, Jan 29, 2019 at 6:08 AM Kenneth Knowles wrote: > > Ah, I did not close the staging repository. Thanks for letting me know. Try > now. > > Kenn > > On Mon, Jan 28, 2019 at 2:31 PM Ismaël Mejía wrote: >> >> I think

Re: [DISCUSSION] UTests and embedded backends

2019-01-28 Thread Robert Bradshaw
assandra-unit) to the UTests. > Ticket was opened while ago: https://issues.apache.org/jira/browse/BEAM-4164 > > Etienne > > Le mardi 22 janvier 2019 à 09:26 +0100, Robert Bradshaw a écrit : > > On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles wrote: > > > Robert - you mea

Re: [ANNOUNCE] New PMC member: Etienne Chauchot

2019-01-28 Thread Robert Bradshaw
Thanks for all your great work. Congratulations and welcome! On Mon, Jan 28, 2019 at 10:21 AM Alexey Romanenko wrote: > > Great job! Congrats, Etienne! > > On 28 Jan 2019, at 07:18, Ahmet Altay wrote: > > Congratulations Etienne! > > On Sun, Jan 27, 2019 at 7:15 PM Reza Ardeshir Rokni wrote:

Re: BEAM-6324 / #7340: "I've pretty much given up on the PR being merged. I use my own fork for my projects"

2019-01-28 Thread Robert Bradshaw
On Mon, Jan 28, 2019 at 10:37 AM Etienne Chauchot wrote: > > Sure it's a pity than this PR got unnoticed and I think it is a combination > of factors (PR date around Christmas, the fact that the author forgot - AFAIK > - to ping a reviewer in either the PR or the ML). > > I agree with Rui's

Re: Cross-language pipelines

2019-01-24 Thread Robert Bradshaw
On Fri, Jan 25, 2019 at 12:18 AM Reuven Lax wrote: > > On Thu, Jan 24, 2019 at 2:38 PM Robert Bradshaw wrote: >> >> On Thu, Jan 24, 2019 at 6:43 PM Reuven Lax wrote: >> > >> > Keep in mind that these user-supplied lambdas are commonly used in our >&

Re: [DISCUSSION] ParDo Async Java API

2019-01-24 Thread Robert Bradshaw
ally return a future could be an option, even >>>> if the language itself doesn't support something like `await`, you could >>>> still implement it yourself in the DoFn, however, it seems like it'd be a >>>> strange contrast to the non-async version, which

Re: Cross-language pipelines

2019-01-24 Thread Robert Bradshaw
a commonly-understood URN + payload, it'll >> work. A transform could provide a wide range of "useful" URNs for its >> internal callbacks, more than that would require significant design if >> it can't be pre- or post-fixed. >> >> > On Wed, Jan 23

Re: Cross-language pipelines

2019-01-24 Thread Robert Bradshaw
understood URN + payload, it'll work. A transform could provide a wide range of "useful" URNs for its internal callbacks, more than that would require significant design if it can't be pre- or post-fixed. > On Wed, Jan 23, 2019 at 7:04 PM Chamikara Jayalath > wrote: >> >&

Re: [DISCUSSION] ParDo Async Java API

2019-01-24 Thread Robert Bradshaw
If I understand correctly, the end goal is to process input elements of a DoFn asynchronously. Were I to do this naively, I would implement DoFns that simply take and receive [Serializable?]CompletionStages as element types, followed by a DoFn that adds a callback to emit on completion (possibly

Re: Cross-language pipelines

2019-01-23 Thread Robert Bradshaw
happen as part of construction because the set of outputs (and their properties) can be dynamic based on the expansion. > Thanks, > Max > > On 23.01.19 04:12, Robert Bradshaw wrote: > > No, this PR simply takes an endpoint address as a parameter, expecting > > it to alread

Re: Dealing with expensive jenkins + Dataflow jobs

2019-01-23 Thread Robert Bradshaw
I like the idea of creating separate project(s) for load tests so as to not compete with other tests and the standard development cycle. As for how many workers is too many, I would take the track "what is it we're trying to test?" Unless your stress-testing the shuffle itself, much of what Beam

Re: How to use "PortableRunner" in Python SDK?

2019-01-23 Thread Robert Bradshaw
We should probably make the job endpoint mandatory for PortableRunner, and offer a separate FlinkRunner (and others) that provides a default endpoint and otherwise delegates everything down. On Thu, Nov 15, 2018 at 12:07 PM Maximilian Michels wrote: > > > 1) The default behavior, where

Re: Cross-language pipelines

2019-01-23 Thread Robert Bradshaw
;> >>> >>> On Tue, Jan 22, 2019 at 10:53 AM Chamikara Jayalath >>> wrote: >>>> >>>> Thanks Robert. >>>> >>>> On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw >>>> wrote: >>>>> >>>>> Now

Cross-language pipelines

2019-01-22 Thread Robert Bradshaw
Now that we have the FnAPI, I started playing around with support for cross-language pipelines. This will allow things like IOs to be shared across all languages, SQL to be invoked from non-Java, TFX tensorflow transforms to be invoked from non-Python, etc. and I think is the next step in

Re: [DISCUSSION] UTests and embedded backends

2019-01-22 Thread Robert Bradshaw
On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles wrote: > > Robert - you meant this as a mostly-automatic thing that we would engineer, > yes? Yes, something like TestPipeline that buffers up the pipelines and then executes on class teardown (details TBD). > A lighter-weight fake, like using

Re: gradle clean causes long-running python installs

2019-01-21 Thread Robert Bradshaw
Just some background, grpcio-tools is what's used to do the proto generation. Unfortunately it's expensive to compile and doesn't provide very many wheels, so we want to install it once, not every time. (It's also used in more than just tests; one needs it every time the .proto files change.)

Re: [DISCUSSION] UTests and embedded backends

2019-01-21 Thread Robert Bradshaw
I am of the same opinion, this is the approach we're taking for Flink as well. Various mitigations (e.g. capping the parallelism at 2 rather than the default of num cores) have helped. Several times the idea has been proposed to group unit tests together for "expensive" backends. E.g. for

Re: Python Flink tests failing on Jenkins

2019-01-16 Thread Robert Bradshaw
ppreciated. On Wed, Jan 16, 2019 at 3:32 AM Ahmet Altay wrote: > > +Robert Bradshaw > > Is it this test suite: > https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_Python_ValidatesRunner_Flink_Commit/ > > There is a recent change related to that > https:

Re: Beam JobService Problem

2019-01-15 Thread Robert Bradshaw
On Tue, Jan 15, 2019 at 1:19 AM Ankur Goenka wrote: > > Thanks Sam for bringing this to the list. > > As preparation_ids are not reusable, having preparation_id and job_id same > makes sense to me for Flink. I think we change the protocol and only have one kind of ID. As well as solving the

Re: Shall we use "tenacity" library to help deflake some of Python tests using retry logic?

2019-01-11 Thread Robert Bradshaw
I think that makes a lot of sense, and tenacity looks like a decent, maintained library. We should use this sparingly, but it is helpful for algorithms that have an intrinsic amount of randomness/noise (e.g. the sampling code) to reduce a 1% chance of failure to a 1 in a million. On Fri, Jan 11,

Re: error with DirectRunner

2019-01-10 Thread Robert Bradshaw
https://github.com/apache/beam/pull/7456 On Thu, Jan 10, 2019 at 10:59 AM Robert Bradshaw wrote: > > Sorry this got lost. I filed > https://issues.apache.org/jira/browse/BEAM-6404; hopefully it'll be an > easy fix. > > On Wed, Jan 9, 2019 at 8:33 PM Allie Chen wrote:

Re: error with DirectRunner

2019-01-10 Thread Robert Bradshaw
Runner` instead of `DirectRunner`, will there be any > performance issues/caveats I should worry about? > > Thanks! > Allie > > On Tue, Oct 30, 2018 at 8:13 PM Udi Meiri wrote: >> >> +Robert Bradshaw I would be happy to debug and fix this, but I'd need more >> gu

Re: [DISCUSS] (Forked thread) Beam issue triage & assignees

2019-01-10 Thread Robert Bradshaw
On Thu, Jan 10, 2019 at 3:20 AM Ahmet Altay wrote: > > I agree with the proposals here. Initial state of "Needs Review" and blocking > releases on untriaged issues will ensure that we will at least look at every > new issue once. +1. I'm more ambivalent about closing stale issues. Unlike PRs,

Re: [Go SDK] User Defined Coders

2019-01-09 Thread Robert Bradshaw
oth in the "computer resource" sense and "easy for people to get stuff done" sense). > Reuven > > On Tue, Jan 8, 2019 at 7:44 AM Robert Bradshaw wrote: >> >> On Tue, Jan 8, 2019 at 4:32 PM Reuven Lax wrote: >> > >> > I agree with this,

Re: Enforce javadoc comments in public methods?

2019-01-09 Thread Robert Bradshaw
t; > >>> > >>> > Another perspective is that someone is getting away with missing >>> > documentation at N-1. Seems OK. But maybe just >>> > allowMissingPropertyJavadoc (from >>> > http://checkstyle.sourceforge.net/conf

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Robert Bradshaw
emas (though that's where it becomes even more useful). > Also while columnar can be a large perf win, I suspect that we currently have > lower-hanging fruit to optimize when it comes to performance. It's probably a bigger win for Python than for Java. > > Reuven > > On Tue,

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Robert Bradshaw
On Fri, Jan 4, 2019 at 12:54 AM Reuven Lax wrote: > > I looked at Apache Arrow as a potential serialization format for Row coders. > At the time it didn't seem a perfect fit - Beam's programming model is > record-at-a-time, and Arrow is optimized for large batches of records (while > Beam has

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Robert Bradshaw
On Fri, Jan 4, 2019 at 7:05 PM Kenneth Knowles wrote: > > On Thu, Jan 3, 2019 at 4:33 PM Reuven Lax wrote: >> >> If a user wants custom encoding for a primitive type, they can create a >> byte-array field and wrap that field with a Coder I don't think the primary use of coders is a custom

Re: Enforce javadoc comments in public methods?

2019-01-08 Thread Robert Bradshaw
t; What I meant is to enforce only for a method that is BOTH 1) public method >>> 2) has longer than N lines. >>> >>> sorry for not making the proposal clear enough in the original message, it >>> should've better titled "enforce ... on non-trivial public

Re: Query expressions for schema fields

2019-01-07 Thread Robert Bradshaw
On Sun, Jan 6, 2019 at 12:46 PM Reuven Lax wrote: > > Some time ago, @Jean-Baptiste Onofré made the excellent suggestion that we > look into using JsonPath as a selector format for schema fields. This > provides a simple and natural way for users to select nested schema fields, > as well as

Re: Enforce javadoc comments in public methods?

2019-01-07 Thread Robert Bradshaw
IMHO, requiring comments on trivial methods like setters and getters is often a net negative, but setting some standard could be useful. On Mon, Jan 7, 2019 at 7:35 AM Jean-Baptiste Onofré wrote: > > Hi, > > for the presence of a comment on public method, it's a good idea. Now, > about the

Re: Adding ":beam-runners-direct-java:needsRunnerTests" to "Run Java PreCommit"

2018-12-28 Thread Robert Bradshaw
I don't know much about these specific tests, but could we simplify things and get rid of the whole "Needs Runner" designation by just making the direct runner a dependency for the tests of each module (including core)? On Fri, Dec 28, 2018 at 6:20 PM Gleb Kanterov wrote: > > I looked into

Re: [VOTE] Release Vendored gRPC 1.13.1 v0.2, release candidate #1

2018-12-21 Thread Robert Bradshaw
+1, let's get this out. We can decide about 2.9.1 later. On Fri, Dec 21, 2018 at 10:43 AM Maximilian Michels wrote: > > +1 > > On 20.12.18 23:11, Tyler Akidau wrote: > > +1, Approve the release. > > > > -Tyler > > > > On Thu, Dec 20, 2018 at 9:49 AM Ahmet Altay > >

Re: excessive java precommit logging

2018-12-20 Thread Robert Bradshaw
Interestingly, I was thinking exactly the same thing the other day. If we could drop the info logs for passing tests, that would be ideal. Regardless, tests should fail (when possible) with actionable messages. I think the rare case of not being able to reproduce the error locally if info logs

Re: Report to the Board, December 2018

2018-12-12 Thread Robert Bradshaw
Mostly looks good to me. I would probably omit the {issues,builds}@beam.apache.org stats as "nothing significant in the figures" but note that the dev list excludes the previous automated emails (or at least a reference to where you say this above). Is it worth noting StackOverflow activity here

Re: beam9 failing most of the python tests

2018-12-10 Thread Robert Bradshaw
The same error is impacting our postcommit tests. Who has permissions to reboot these machines? On Sat, Dec 8, 2018 at 3:13 AM Ankur Goenka wrote: > Virtual env setup is failing because of the following error. Can we reboot > the machine to see if it fixes the issue? > >

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-12-07 Thread Robert Bradshaw
How should we move forward on this? The idea looks good, and even comes with a PR to review. Any objections to the names? On Wed, Dec 5, 2018 at 12:48 PM Jeff Klukas wrote: > > Reminder that I'm looking for review on > https://github.com/apache/beam/pull/7160 > > On Thu, Nov 29, 2018, 11:48 AM

Re: [DISCUSS] Structuring Java based DSLs

2018-12-03 Thread Robert Bradshaw
; dependency should of course remain sdks-java-sql in all cases. > >Jan > > On 12/1/18 12:54 AM, Robert Bradshaw wrote: > > I suppose what I'm trying to say is that I see this module structure > > as a tool for discoverability and enumerating end-user endpoints. In >

Re: [DISCUSS] SplittableDoFn Java SDK User Facing API

2018-11-30 Thread Robert Bradshaw
On Fri, Nov 30, 2018 at 10:14 PM Lukasz Cwik wrote: > > On Fri, Nov 30, 2018 at 1:02 PM Robert Bradshaw wrote: >> >> On Fri, Nov 30, 2018 at 6:38 PM Lukasz Cwik wrote: >> > >> > Sorry, for some reason I thought I had answered these. >> >> No pro

Re: [SDF] Why do we need markDone (or an equivalent claim)?

2018-11-30 Thread Robert Bradshaw
On Fri, Nov 30, 2018 at 10:28 PM Lukasz Cwik wrote: > > On Fri, Nov 30, 2018 at 12:47 PM Robert Bradshaw wrote: >> >> On Fri, Nov 30, 2018 at 7:10 PM Lukasz Cwik wrote: >> > >> > Uh, I'm not sure what your asking. >> >> I'm asking why we wanted

Re: [DISCUSS] Structuring Java based DSLs

2018-11-30 Thread Robert Bradshaw
those optimizations are happening or > will happen, we might start to have a sense of it. > > -Rui > > On Fri, Nov 30, 2018 at 12:38 PM Robert Bradshaw wrote: > > I don't really see Euphoria as a subset of SQL or the other way > around, and I think it makes sense to use either w

Re: [DISCUSS] SplittableDoFn Java SDK User Facing API

2018-11-30 Thread Robert Bradshaw
On Fri, Nov 30, 2018 at 6:38 PM Lukasz Cwik wrote: > > Sorry, for some reason I thought I had answered these. No problem, thanks for you patience :). > On Fri, Nov 30, 2018 at 2:20 AM Robert Bradshaw wrote: >> >> I still have outstanding questions (above) about >>

Re: [SDF] Why do we need markDone (or an equivalent claim)?

2018-11-30 Thread Robert Bradshaw
and would be similar to passing in Long.MAX_VALUE for the file > offset range. Having to choose a value pass depending on the restriction tracker type is something that could simply be eliminated. > On Fri, Nov 30, 2018 at 2:45 AM Robert Bradshaw wrote: >> >> In

Re: [DISCUSS] Structuring Java based DSLs

2018-11-30 Thread Robert Bradshaw
I don't really see Euphoria as a subset of SQL or the other way around, and I think it makes sense to use either without the other, so by this criteria keeping them as siblings than a nesting. That said, I think it's really good to have a bunch of shared code, e.g. a join library that could be

[SDF] Why do we need markDone (or an equivalent claim)?

2018-11-30 Thread Robert Bradshaw
In looking at the SDF examples, it seems error-prone to have to remember to write tryClaim([fake-end-position]) to indicate that a restriction is finished. IIRC, this was done to decide whether the entire restriction had been processed on return in the case that tryClaim never returned

Re: [DISCUSS] SplittableDoFn Java SDK User Facing API

2018-11-30 Thread Robert Bradshaw
essages and then continue to read anything >>> > else that has been enqueued. >>> > >>> > Bundle finalization is unrelated to backlogs but is needed since there is >>> > a class of data stores which need acknowledgement that says I have >>> > successfully received

Re: Handling large values

2018-11-29 Thread Robert Bradshaw
On Thu, Nov 29, 2018 at 7:08 PM Lukasz Cwik wrote: > > On Thu, Nov 29, 2018 at 7:13 AM Robert Bradshaw wrote: >> >> On Thu, Nov 29, 2018 at 2:18 AM Lukasz Cwik wrote: >> > >> > I don't believe we would need to change any other coders since >> > Seek

Re: Handling large values

2018-11-29 Thread Robert Bradshaw
gth prefix coder is never used with IOs > and hence those IOs could be given a type like Iterable which is lazy, > but the encoding for that wouldn't be lazy Yes, that's how it is now. > and would output all the data from the SeekableInputStream. > > > On Wed, Nov 28, 2018 at 3

Re: Handling large values

2018-11-28 Thread Robert Bradshaw
On Wed, Nov 28, 2018 at 11:57 PM Lukasz Cwik wrote: > > Re-adding +datapls-portability-t...@google.com > +datapls-unified-wor...@google.com > > On Wed, Nov 28, 2018 at 2:23 PM Robert Bradshaw wrote: >> >> Thanks for bringing this to the list. More below. >> &g

Re: Handling large values

2018-11-28 Thread Robert Bradshaw
Thanks for bringing this to the list. More below. On Wed, Nov 28, 2018 at 11:10 PM Kenneth Knowles wrote: > FWIW I deliberately limited the thread to not mix public and private > lists, so people intending private replies do not accidentally send to > dev@beam. > > I've left them on this time,

Re: Evolving a Coder for an added field

2018-11-26 Thread Robert Bradshaw
nding of the Coder machinery to be able >>> to design a solution, so I'd need to hand this off or simply leave it in >>> the Jira backlog. >>> >>> [0] https://github.com/apache/beam/pull/6914 >>> >>> >>> On Tue, Nov 6, 2018 at 4:38 AM Rob

Re: Reading CSV from google cloud storage to Data Flow

2018-11-26 Thread Robert Bradshaw
The same holds true in Python: Read the files with TextIO and follow with a Map operation that splits the lines into records. This, of course, only works if you don't have newlines within your records. In that case, you may need to use a DoFn that takes as input a each filename and reads the

Re: [DISCUSS] Reverting commits on green post-commit status

2018-11-20 Thread Robert Bradshaw
to figure out >> whether the problem can be solved upstream or downstream, or with a >> combination of both. >> >> I think Thomas wanted to address this latter case. It seems like we're >> all more or less on the same page. The core problem is more related to >> co

Re: [DISCUSS] SplittableDoFn Java SDK User Facing API

2018-11-20 Thread Robert Bradshaw
mon and seems to be supported by Java (BigDecimal), > Python (decimal module) and Go (via shopspring/decimal). B is a close > second since many languages can convert it. > Any reason to not just use double? (Do we need arbitrary/fixed precision for anything?) > On Tue, Nov 20, 2018 at

Re: E-mail Organization

2018-11-20 Thread Robert Bradshaw
I was about to suggest tags in subject lines as well. Easier to see in email listings than anything in the body. On Mon, Nov 19, 2018 at 7:22 PM Lukasz Cwik wrote: > Putting the tags in the subject line is inline with the style of what we > currently do using [DISCUSS], [VOTE], [BEAM-YYY] so I

Re: [DISCUSS] SplittableDoFn Java SDK User Facing API

2018-11-20 Thread Robert Bradshaw
ly we could have an unbounded >>> PCollection goto a BoundedPerElement DoFn and that will produce an >>> unbounded PCollection. Restrictions.IsBounded is used during pipeline >>> execution to inform the runner whether a restriction being returned is >>> bounde

<    4   5   6   7   8   9   10   11   12   13   >