Classes named "Bound" are used throughout the sdk to describe builders that
are specified enough to be applied. It indicates that the required
parameters have been bound. It is not related to whether the output
PCollection is bounded or unbounded.
On Thu, Feb 18, 2016, 7:42 AM bakey pan wrote:
>
Hi!
The following document describes work that we're planning on doing to allow
every steps in a pipeline to include more information about what is
actually going on. The goal is to allow UIs and diagnostic tools to display
details about what is happening inside each step by including details what
ent greatly describes the data display. Do you have any plan to
> implement kind of "checkpoint"/alerting depending of some predicates on
> the data (it's something that I had in mind for the Beam data
> integration DSL) ? It's maybe the TRIGGER type ?
>
> Thanks again
1. Regarding "java" as a module -- are we sure that other languages will be
packaged using Maven as well? For instance, Python has its own ecosystem
which likely doesn't play well with Python.
2. Using the literal "SNAPSHOT" as the qualifier has special meaning Maven
-- it is newer than all other
ating for a release (it's what I did in
> the PR). Like this, the Maven standards are still valid.
>
> Regards
> JB
>
> On 03/21/2016 06:20 PM, Ben Chambers wrote:
> > 1. Regarding "java" as a module -- are we sure that other languages will
> be
> &g
My concern with that is we aren't making clear what constitutes "whenever
possible". Could we more concretely define that (eg., "for example, when
Github is down")? Were there specific cases that you had in mind?
Otherwise, I worry about the ambiguity introduced and the possibility for
different pe
My only concern is that in the example, you first need to declare all the
inputs, then the pipeline to be tested, then all the outputs. This can lead
to tests that are hard to follow, since what you're really testing is an
interleaving more like "When these inputs arrive, I get this output. Then
wh
On Mon, Mar 28, 2016 at 4:29 PM Robert Bradshaw
wrote:
> On Fri, Mar 25, 2016 at 4:28 PM, Ben Chambers
> wrote:
> > My only concern is that in the example, you first need to declare all the
> > inputs, then the pipeline to be tested, then all the outputs. This can
> lead
Nice post Robert! Thanks for writing up the rationale.
On Fri, May 27, 2016 at 12:38 PM Robert Bradshaw
wrote:
> Hi all!
>
> One of the questions that often gets asked is why Beam has PTransforms
> for everything instead of having methods on PCollection. This morning
> I published a blog post ex
I think there is a difference:
- If failure occurs after finishBundle() but before the consumption is
committed, then the bundle may be reprocessed, which leads to duplicated
calls to processElement() and finishBundle().
- If failure occurs after consumption is committed but before
finishBundle(),
On Wed, Jun 8, 2016 at 10:29 AM Raghu Angadi
wrote:
> On Wed, Jun 8, 2016 at 10:13 AM, Ben Chambers >
> wrote:
>
> > - If failure occurs after finishBundle() but before the consumption is
> > committed, then the bundle may be reprocessed, which leads to duplicated
>
+1 (binding)
Excited to see the first release underway!
On Thu, Jun 9, 2016 at 8:48 AM Kenneth Knowles
wrote:
> +1 (binding)
>
> Confirmed that we can run pipelines on Dataflow.
>
> Looks good. Very exciting!
>
>
> On Thu, Jun 9, 2016 at 8:16 AM, Jean-Baptiste Onofré
> wrote:
>
> > Team work !
Based on a recent PR (https://github.com/apache/incubator-beam/pull/468) I
was reminded of the confusion around the use of
.apply(transform.named(someName)) and .apply(someName, transform). This is
one of things I’ve wanted to cleanup for a while. I’d like to propose a
path towards removing this re
ore sense to name the application of a transform
> > rather than the transform itself. (Still mulling on how best to do
> > this with Python...)
> >
> > On Wed, Jun 22, 2016 at 9:27 PM, Jean-Baptiste Onofré
> > wrote:
> > > +1
> > >
> > >
(Minor Issue: I'd propose waitUntilDone and waitUntilRunning rather than
waitToRunning which reads oddly)
The only reason to separate submission from waitUntilRunning would be if
you wanted to kick off several pipelines in quick succession, then wait for
them all to be running. For instance:
Pipe
, which is the same amount of work to implement. Am I correct
> > that everyone in this thread thinks this generality is just not the right
> > thing for a user API?
> >
> > - This enum could probably use revision. I'd chose some combination of
> >
Would we use the same Jira to track the series of PRs implementing the
proposal (if accepted) or would it be discussion only (possibly linked to
the implementation tasks)?
On Sun, Aug 7, 2016, 9:48 PM Frances Perry wrote:
> I'm a huge fan of keeping all the details related to a topic in a releva
Hi Thomas!
On Tue, Aug 16, 2016 at 9:40 PM Thomas Weise wrote:
> I'm trying to rebase a PR and adjust for the DoFn changes.
>
Can you elaborate on what you're trying to do (or send a link to the PR)?
> CounterSet is gone and there is now AggregatorFactory and I'm looking to
> fix an existing
aseline and proper support for aggregators isn't part of it, it's
> something I was planning to take up in subsequent round.
>
> Thomas
>
>
> On Wed, Aug 17, 2016 at 8:14 AM, Ben Chambers
> wrote:
>
> > Hi Thomas!
> >
> > On Tue, Aug 1
When Beam was contributed it inherited an import order [1] that was pretty
arbitrary. We've added org.apache.beam [2], but continue to use this
ordering.
Both Eclipse and IntelliJ default to grouping imports into alphabetic
order. I think it would simplify development if we switched our checkstyle
sse Anderson
> > wrote:
> >
> > > Please. That's the one that always trips me up.
> > >
> > > On Tue, Aug 23, 2016, 4:10 PM Ben Chambers
> wrote:
> > >
> > > > When Beam was contributed it inherited an import order [1] that was
>
I think this is using http://www.mathwords.com/i/interval_notation.htm to
indicate that the interval includes the start time but not the end time.
On Mon, Sep 19, 2016, 8:56 AM Jesse Anderson wrote:
> The toString() to IntervalWindow starts with a square bracket and ends with
> a parenthesis. Is
I started looking at BEAM-147: “Rename Aggregator to [P]Metric”. Rather
than renaming the existing concept I’d like to introduce Metrics as a
simpler mechanism to provide information during pipeline execution (I have
updated the issue accordingly).
Here is what I'm thinking would lead to a simpler
make.
As always, let me know if there are any questions or comments!
-- Ben
On Wed, Sep 28, 2016 at 5:05 PM Ben Chambers wrote:
I started looking at BEAM-147: “Rename Aggregator to [P]Metric”. Rather
than renaming the existing concept I’d like to introduce Metrics as a
simpler mechanism
As Kenn points out, I think the nature of the Redistribute operation is to
act as a hint (or requirement) to the runner that a certain distribution
the elements is desirable. In a perfect this wouldn't be necessary because
every runner would be able to do exactly the right thing. Looking at the
dif
e. It will also
allow us to revisit them from a clean slate.
Thoughts?
On Thu, Oct 6, 2016 at 5:41 AM Aljoscha Krettek wrote:
Hi,
I'm currently in holidays but I'll put some thought into this and give my
comments once I get back.
Aljoscha
On Wed, Oct 5, 2016, 21:51 Ben Chambers
This is also a good reason to avoid overly general names like "from",
"create" and "of". Instead, the option should be ".fromQuery(String
query)", so we can add ".fromTable(...)".
On Thu, Oct 13, 2016 at 4:55 PM Dan Halperin
wrote:
> For #3 -- I think we should be VERY careful there. You need to
> >
> > On Wed, Oct 12, 2016 at 10:48 AM Kenneth Knowles wrote:
> >
> >> Huzzah! This is IMO a really great change. I agree that we can get
> >> something in to allow work to continue, and improve the API as we
learn.
> >>
> >> On Wed, O
Congrats. +3!
On Fri, Oct 21, 2016 at 3:34 PM Kenneth Knowles
wrote:
> Huzzah!
>
> I've personally enjoyed working together, and I am glad to extend this
> acknowledgement and welcome this addition to the Beam community.
>
> Kenn
>
> On Fri, Oct 21, 2016 at 3:18 PM Davor Bonaci wrote:
>
> > Hi
I also like Distinct since it doesn't make it sound like it modifies any
underlying collection. RemoveDuplicates makes it sound like the duplicates
are removed, rather than a new PCollection without duplicates being
returned.
On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré wrote:
> Agree. It
One option would be to use the reflective DoFn approach to this. Imagine
something like:
public class MyExternalFn extends DoFn {
@ProcessElement
// Existence of ShellExecutor indicates the code shells out.
public void processElement(ProcessContext c, ShellExecutor shell) {
...
Futur
y functions (or perhaps regular language-specific facilities like
> java's ProcessBuilder).
>
> On Mon, Dec 5, 2016 at 1:21 PM Ben Chambers wrote:
>
> One option would be to use the reflective DoFn approach to this. Imagine
> something like:
>
> public class MyExternalFn exte
+1 -- This seems like the best option. It's a mechanical change, and the
compiler will let users know it needs to be made. It will make the mistake
much less common, and when it occurs it will be much clearer what is wrong.
It would be great if we could make the mis-use a compiler problem or a
pip
gt; through in the same release.
>
> Dan
>
> On Thu, Dec 8, 2016 at 7:48 AM, Aljoscha Krettek
> wrote:
>
> > +1
> >
> > I've seen this mistake myself in some PRs.
> >
> > On Thu, 8 Dec 2016 at 06:10 Ben Chambers
> > wrote:
> >
> &g
I think I agree with Robert (unless I'm misunderstanding his point).
I think that the shell commands are going to be the most useful if it is
possible to take the elements in an input PCollection, construct a shell
command depending on those elements, and then execute it. I think doing so
in a ful
35 matches
Mail list logo