+1 (binding)
- all files have incubating
- signatures check out (and KEYS there)
- disclaimer exists
- LICENSE and NOTICE good
- No unexpected binary in source
- All ASF licensed files have ASF headers
- source distribution available and content is good
Improvements for next release:
- As the
Hi everyone,
Here's the first vote for the first release of Apache Beam -- version
0.1.0-incubating!
As a reminder, we aren't looking for any specific new functionality, but
would like to release the existing code, get something to our users' hands,
and test the processes. Previous discussions
The third release candidate is now available for everyone's review [1],
which should be incorporating all feedback so far.
Please comment if there's additional feedback, as we are about to start the
voting process.
[1] https://repository.apache.org/content/repositories/orgapachebeam-1002
On
Thanks for the clarification JB. In the projects I’ve been involved with, I’ve
not seen that practice.
As long as the resulting release ends up on dist.a.o I don’t think it’s a
problem.
-Taylor
> On Jun 8, 2016, at 12:49 AM, Jean-Baptiste Onofré wrote:
>
> Hi Taylor,
>
To Davor, JB and anyone else helping with the release, Thanks! this looks
great.
On Wed, Jun 8, 2016 at 9:11 PM Amit Sela wrote:
> Regarding Dan's questions:
> 1. I'm not sure - it is built with spark-*_2.10 but I honestly don't know
> if this matters for the runner
Regarding Dan's questions:
1. I'm not sure - it is built with spark-*_2.10 but I honestly don't know
if this matters for the runner itself, it could be nice to have in order to
be more informative. In addition, this will change with Spark 2.0 to Scala
2.11 AFAIK.
2. This is to allow running
On Wed, Jun 8, 2016 at 10:29 AM Raghu Angadi
wrote:
> On Wed, Jun 8, 2016 at 10:13 AM, Ben Chambers >
> wrote:
>
> > - If failure occurs after finishBundle() but before the consumption is
> > committed, then the bundle may be
On Wed, Jun 8, 2016 at 10:15 AM, Dan Halperin
wrote:
> > I thought finishBundle()
> > exists simply as best-effort indication from the runner to user some
> chunk
> > of records have been processed.. not part of processing guarantees. Also
> > the term "bundle"
In the case of failure, a DoFn instance will not be reused; however, in the
case of failure either the inputs will be retried, or the pipeline will
fail, allowing a newly deserialized instance of the DoFn to reprocess the
inputs (which should produce the same result, meaning there is no data
A Bundle is an arbitrary collection of elements. A PCollection is divided
into bundles at the discretion of the runner. However, the bundles must
partition the input PCollection; each element is in exactly one bundle, and
each bundle is successfully committed exactly once in a successful pipeline.
On Wed, Jun 8, 2016 at 10:13 AM, Ben Chambers
wrote:
> - If failure occurs after finishBundle() but before the consumption is
> committed, then the bundle may be reprocessed, which leads to duplicated
> calls to processElement() and finishBundle().
>
> - If
The unit of commit is the bundle.
Consider a DoFn that does batching (e.g. to interact with some
external service less frequently). Items may be buffered during
process() but these buffered items must be processed and the results
emitted in finishBundle(). If inputs are committed as being
On Wed, Jun 8, 2016 at 10:05 AM, Raghu Angadi
wrote:
> Such data loss can still occur if the worker dies after finishBundle()
> returns, but before the consumption is committed.
If the runner is correctly implemented, there will not be data loss in this
case -- the
Such data loss can still occur if the worker dies after finishBundle()
returns, but before the consumption is committed. I thought finishBundle()
exists simply as best-effort indication from the runner to user some chunk
of records have been processed.. not part of processing guarantees. Also
the
finishBundle() **must** be called before any input consumption is committed
(i.e. marking inputs as completed, which incldues committing any elements
they produced). Doing otherwise can cause data loss, as the state of the
DoFn is lost if a worker dies, but the input elements will never be
Ahh, what we could do is artificially induce bundles using either count or
processing time or both. Just so that finishBundle() is called once in a
while.
On Wed, 8 Jun 2016 at 17:12 Aljoscha Krettek wrote:
> Pretty sure, yes. The Iterable in a MapPartitionFunction should
Pretty sure, yes. The Iterable in a MapPartitionFunction should give you
all the values in a given partition.
I checked again for streaming execution. We're doing the opposite, right
now: every element is a bundle in itself, startBundle()/finishBundle() are
called for every element which seems a
17 matches
Mail list logo