Hi Peter,
Looking forward to your PR. Please note that source classes are relatively
tricky to develop, so would you mind briefly explaining what your source
will do here over email, so that we hash out some possible issues early
rather than in PR comments?
Also note that now recommend to package
uot; (as did in the
> >PubSubIO). I'm now refactoring it to use the "new style".
> >
> >Regards
> >JB
> >
> >On 03/14/2016 06:47 PM, Eugene Kirpichov wrote:
> >> Hi Peter,
> >> Looking forward to your PR. Please note that source classes
Hi Sumit,
All reusable parts of a pipeline, including connectors to storage systems,
should be packaged as PTransform's.
Sink is an advanced API that you can use under the hood to implement the
transform, if this particular connector benefits from this API - but you
don't have to, and many
/_/google.com/splittabledofn -
I confirmed that it can be joined without being logged into a Google
account)
Who'd be interested in attending, and does this time/date work for people?
On Fri, Aug 5, 2016 at 10:48 AM Eugene Kirpichov <kirpic...@google.com>
wrote:
> Hi JB, thanks fo
r my understanding.
>
> Thanks
> Regards
> JB
>
>
>
> On Aug 11, 2016, 19:46, at 19:46, Eugene Kirpichov
> <kirpic...@google.com.INVALID> wrote:
> >Hi JB,
> >
> >What are your thoughts on this?
> >
> >I'm also thinking of having a virtual m
, Aparup Banerjee (apbanerj) <
> apban...@cisco.com> wrote:
>
> > + 1, me2
> >
> >
> >
> >
> > On 8/12/16, 9:27 AM, "Amit Sela" <amitsel...@gmail.com> wrote:
> >
> > >+1 as in I'll join ;-)
> > >
> > >On
Hello Beam community,
We (myself, Daniel Mills and Robert Bradshaw) would like to propose
"Splittable DoFn" - a major generalization of DoFn, which allows processing
of a single element to be non-monolithic, i.e. checkpointable and
parallelizable, as well as doing an unbounded amount of work per
t; apban...@cisco.com
> > >
> > wrote:
> >
> > > + 1, me2
> > >
> > >
> > >
> > >
> > > On 8/12/16, 9:27 AM, "Amit Sela" <amitsel...@gmail.com <javascript:;>>
> > > wrote:
> > >
> > > >+1 as in I'll
Onofré <j...@nanthrax.net>
wrote:
> Anyway, from a runner perspective, we will have kind of API (part of the
> Runner API) to "orchestrate" the SDF as we discussed during the call,
> right ?
>
> Regards
> JB
>
> On 08/21/2016 07:24 PM, Eugene Kirpichov wrote:
/io/kafka/KafkaIO.java#L636
>
> On Wed, Sep 7, 2016 at 1:24 AM Eugene Kirpichov
> <kirpic...@google.com.invalid> wrote:
>
> > Hi Amit,
> > Could you explain more about why you're saying the order of splits
> matters?
> > AFAIK the semantics of Read.Unbounded is &q
Hi Amit,
Could you explain more about why you're saying the order of splits matters?
AFAIK the semantics of Read.Unbounded is "read from all of the splits in
parallel, checkpointing each of them independently", so their order
shouldn't matter.
On Tue, Sep 6, 2016 at 3:17 PM Amit Sela
Hi Daniel,
Thanks for raising this. I think I was a major contributor to your
frustration with the process by suggesting big changes to your IO PR.
As others have said, ideally that process should have gone differently: 1)
we should have had documentation on the best practices in developing IOs,
schema-using-reflection
. But this should be probably provided as a collection of utilities, rather
than as a core feature of the programming model.
>
> Regards
> JB
>
> On 09/20/2016 03:59 AM, Eugene Kirpichov wrote:
> > Hi!
> >
> > Thank you for raising these issues.
Two cents: While we're at it, we could consider enforcing formatting as
well (https://github.com/google/google-java-format). That's a bigger change
though, and I don't think it has checkstyle integration or anything like
that.
On Tue, Aug 23, 2016 at 4:54 PM Dan Halperin
eDoFns be asked for a watermark, does this mean that watermarks
> can then be "generated" at any operation?
>
> Cheers,
> Aljoscha
>
> On Sun, 21 Aug 2016 at 22:34 Eugene Kirpichov <kirpic...@google.com.invalid
> >
> wrote:
>
> > Hi JB,
> >
>
o letting the user specify a
> >
> > number here; the data is spread out among as many machines as are
> >
> > handling the shuffling (for N elements, there are ~N unique keys,
> >
> > which gets partitioned by the system to the M workers).
> >
> >
>
ks,
> Cham
>
> On Thu, Oct 27, 2016 at 1:45 PM Chamikara Jayalath <chamik...@apache.org>
> wrote:
>
> > On Thu, Oct 27, 2016 at 1:27 PM Eugene Kirpichov
> > <kirpic...@google.com.invalid> wrote:
> >
> > Getting back to this. I noticed that the origi
Hello,
This is a continuation of the discussion on PR
https://github.com/apache/incubator-beam/pull/1050 which turned out more
complex than expected.
Short summary:
Currently FileBasedSink, when writing to /path/to/foo (in practice,
/path/to/foo-x-of-y where y is the total number of
$0.02: Deduplicate? (lends to extensions like Deduplicate.by(some key
extractor function))
On Mon, Oct 24, 2016 at 1:22 PM Dan Halperin
wrote:
> I find "MakeDistinct" more confusing. My votes in decreasing preference:
>
> 1. Keep `RemoveDuplicates` name, ensure that
>
> > Siblings fundamentally will not work. Consider the following
> > perfectly-valid output path: s3://bucket/file-SSS-NNN.txt . A sibling
> would
> > be a new bucket, so not guaranteed to exist.
> >
> > On Thu, Oct 20, 2016 at 1:57 AM, Chamik
Hi Alexey,
In general, things like establishing connections and initializing caches
are better done in @Setup and @TearDown methods, rather than @StartBundle
and @FinishBundle, because DoFn's can be reused between bundles and this
way you get more benefit from reuse.
Bundles can be pretty small,
t; methods pointless, too :-)
>
> One of the better examples of the pattern of "ready-to-go" builders -
> though not a transform - is WindowingStrategy (props to Ben), where there
> are intelligent defaults and you can override them, and it tracks whether
> or not eac
robably wouldn't provide #1). It is
also potentially non-optimal: there can exist better implementations of
each of these 3 semantics not involving a global group-by-key.
It's not clear yet what to do about all this. Thoughts welcome.
On Tue, Oct 11, 2016 at 11:35 AM Kenneth Knowles <k...@google.com
t;
> On Mon, 29 Aug 2016 at 18:00 Eugene Kirpichov <kirpic...@google.com.invalid
> >
> wrote:
>
> > Hi Aljoscha,
> >
> > The watermark reporting is done via
> > ProcessContinuation.futureOutputWatermark, at the granularity of
> returning
> > f
Hi Amit, I'll comment in more detail later, but meanwhile please take a
look at https://github.com/apache/incubator-beam/pull/1565
There is a small amount of relevant changes to spark runner.
Take a look at implementation of SplittableParDo (already committed) in
particular ProcessFn and it's
ParDo*, *Window.Bound*, and
> > *GroupAlsoByWindow* requires a custom implementation by per runner as
> they
> > are not handled by DoFn anymore, right ?
> >
> > On Sun, Dec 11, 2016 at 3:42 PM Eugene Kirpichov
> > <kirpic...@google.com.invalid> wrote:
&g
There is one more data-loss type error, a fix for which should go into the
release.
https://github.com/apache/incubator-beam/pull/1620
On Thu, Dec 15, 2016 at 10:42 AM Davor Bonaci wrote:
> I think we should build another RC.
>
> Two issues:
> * Metrics issue that JB pointed
lection of one or more commands, yielding
> > their outputs). Especially for the case of a Read, which in this case
> > is not splittable (initially or dynamically) and always produces a
> > single element--feels much more like a Map to me.
> >
> > On Tue, Dec 6, 201
Hi JB,
Depending on the scope of what you want to ultimately accomplish with this
extension, I think it may make sense to write a proposal document and
discuss it.
If it's just a collection of utility DoFn's for various well-defined
source/target format pairs, then that's probably not needed, but
Hi JB,
Thanks for bringing this to the mailing list. I also think that this is
useful in general (and that use cases for Beam are more than just classic
bigdata), and that there are interesting questions here at different levels
about how to do it right.
I suggest to start with the highest-level
Thanks Kenn! This is very helpful.
On Wed, Nov 30, 2016 at 4:13 PM Kenneth Knowles <k...@google.com.invalid>
wrote:
> On Wed, Nov 30, 2016 at 3:52 PM, Eugene Kirpichov <
> kirpic...@google.com.invalid> wrote:
>
> > Hello,
> >
> > Do we have anywhere a se
ies matching
> those requirements.
> Then, the end user is not more responsible: the runner will try to
> define if the pipeline can be executed, and where a DoFn has to be run
> (on which worker).
>
> For me, it's two different levels where 2 is smarter but 1 can also make
> s
My question is where do we put such ShellCommands extension ? As a
> module under IO ? As a new extensions module ?
>
> Regards
> JB
>
> On 12/07/2016 06:24 PM, Eugene Kirpichov wrote:
> > Branched off into a separate thread.
> >
> > How about ShellCommands.execute().wi
33 matches
Mail list logo