On Wed, Apr 17, 2019 at 7:48 AM Viliam Durina wrote:
> > Combine.perKey ... certainly is standardized / well-defined
>
> Is there any document where it's defined?
>
At the user level, here:
https://beam.apache.org/documentation/programming-guide/#combine
There are a few places that define it.
-user@ since this is pretty far afield
On Tue, Apr 30, 2019 at 4:22 PM Robert Bradshaw wrote:
> In the original version of the dataflow model, windowing was not
> annotated on each PCollection, rather it was inferred based on tracing
> up the graph to the latest WindowInto operation. This
Very cool. Took a look.
On Tue, Apr 30, 2019 at 6:23 PM Ankur Goenka wrote:
> Exciting! Thanks Etienne for sharing the design and progress.
>
> On Tue, Apr 30, 2019 at 10:11 AM Etienne Chauchot
> wrote:
>
>> Hi guys,
>> As part of the ongoing work on spark runner POC based on structured
>>
Exciting! Thanks Etienne for sharing the design and progress.
On Tue, Apr 30, 2019 at 10:11 AM Etienne Chauchot
wrote:
> Hi guys,
> As part of the ongoing work on spark runner POC based on structured
> streaming framework, I sketched up a design doc (1) to share context and
> design principles.
Anything where the state evolves serially but arbitrarily - the toy example
is the integer counter in my blog post - needs ValueState. You can't do it
with AnyCombineFn. And I think LatestCombineFn is dangerous, especially
when it comes to CombingState. ValueState is more explicit, and I still
On Wed, May 1, 2019 at 1:55 AM Brian Hulette wrote:
>
> Reza - you're definitely not derailing, that's exactly what I was looking for!
>
> I've actually recently encountered an additional use case where I'd like to
> use ValueState in the Python SDK. I'm experimenting with an ArrowBatchingDoFn
Reza - you're definitely not derailing, that's exactly what I was looking
for!
I've actually recently encountered an additional use case where I'd like to
use ValueState in the Python SDK. I'm experimenting with an
ArrowBatchingDoFn that uses state and timers to batch up python
dictionaries into
It sounds like no one has any objections specifically to removing this
code. I'll get someone to review the PR and I'll start a vote to merge it
as soon as it's approved.
On Mon, Apr 29, 2019 at 3:39 AM Robert Bradshaw wrote:
> I'd imagine that most users will continue to debug their pipelines
Michael, Max and other folks who are concerned about the compatibility with
the apache release policy. Does the information in this thread sufficiently
address your concerns? Especially the part where, the rc artifacts will be
protected by a flag (i.e. --pre) from general consumption.
On Tue, Apr
In the original version of the dataflow model, windowing was not
annotated on each PCollection, rather it was inferred based on tracing
up the graph to the latest WindowInto operation. This tracing logic
was put in the SDK for simplicity.
I agree that there is room for a variety of SDK/DSL
On Tue, Apr 30, 2019 at 6:11 PM Ahmet Altay wrote:
>
> This conversation get quite Python centric. Is there a similar need for Java?
I think Java is already covered. Go is a different story (but the even
versioning and releasing is being worked out).
> On Tue, Apr 30, 2019 at 4:54 AM Robert
The Java SDK uses the ASF managed Nexus repository. There is a snapshot one
(where we publish nightly builds) and also a release one (where we place
our release candidates). Once the release candidate is approved the Nexus
repository has a way to publish it making it an official release. More
Agree on adding the 5.5 and the resolution of conflicts/duplicates could be
done by either the runner or the artifact staging service.
On Tue, Apr 30, 2019 at 10:03 AM Chamikara Jayalath
wrote:
>
> On Fri, Apr 26, 2019 at 4:14 PM Lukasz Cwik wrote:
>
>> We should stick with URN + payload +
Being not serializable is not an issue since we use a JSON representation
anyways for all PipelineOptions. Would just register a JSON mapper for
Optional if one doesn't exist already.
On Tue, Apr 30, 2019 at 12:52 PM Anton Kedin wrote:
> Java8 Optional is not serializable. I think this may be a
Java8 Optional is not serializable. I think this may be a blocker. Or not?
Regards,
Anton
On Tue, Apr 30, 2019 at 12:18 PM Lukasz Cwik wrote:
> The migration to requiring @Nullable on methods that could take/return
> null didn't update PipelineOptions contract and its validation to respect
>
The migration to requiring @Nullable on methods that could take/return null
didn't update PipelineOptions contract and its validation to respect it.
We could start using Optional but can't enforce requiring @Nullable since
it is likely backwards incompatible and would break people's current usage
BeamFnControlServiceTest is being worked on in
https://issues.apache.org/jira/browse/BEAM-5709.
On Mon, Apr 29, 2019 at 2:01 PM Reuven Lax wrote:
> yeah, that testClientConnecting test is also extremely flaky.
>
> On Mon, Apr 29, 2019 at 6:50 AM Jean-Baptiste Onofré
> wrote:
>
>> Agree, +1
>>
Thanks, updated the JIRA with a link to this thread and a note of what
could be done.
On Mon, Apr 29, 2019 at 10:29 AM Udi Meiri wrote:
> Pip has a --cache-dir which should be safe with concurrent writes.
>
> On Fri, Apr 26, 2019 at 3:59 PM Ahmet Altay wrote:
>
>> It is possible to download
Hi guys,
As part of the ongoing work on spark runner POC based on structured streaming
framework, I sketched up a design doc (1)
to share context and design principles.
Feel free to comment.
[1] https://s.apache.org/spark-structured-streaming-runner
Etienne
On Fri, Apr 26, 2019 at 4:14 PM Lukasz Cwik wrote:
> We should stick with URN + payload + artifact metadata[1] where the only
> mandatory one that all SDKs and expansion services understand is the
> "bytes" artifact type. This allows us to add optional URNs for file://,
> http://, Maven, PyPi,
Interesting to know it needs to be an object. Thanks. I will try it.
Agree with Kenneth though that Option might be more expected as an user.
On Mon, Apr 29, 2019 at 7:16 PM Kenneth Knowles wrote:
> Does it make use of the @Nullable annotation or just assume any object
> reference could be
One thing to clarify is that we do not use docker. I don't have too much
experience with docker; I assume docker itself already has network
isolation, and that's why it was never necessary to enable security in
portable runner before?
For us because we simply use processes, we need this extra
+dev@ since this has taken a turn in that direction
SDK/DSL consistency is nice. But each SDK/DSL being the best thing it can
be is more important IMO. I'm including DSLs to be clear that this is a
construction issue having little/nothing to do with SDK in the sense of the
per-run-time
Hm, what would be the scenario? Have version A running with original random
sharding and then start version B where I change sharding to some custom
function?
So I have to enable the pipeline to digest old keys from GBK restored state
and also work with new keys produced to GBK going forward?
On
Initial thought on PR: we usually try to limit changing coders in these
types of transforms to better support runners that allow in-place updates
of pipelines. Can this be done without changing the coder?
On Tue, Apr 30, 2019 at 8:21 AM Jozef Vilcek wrote:
> I have created a PR for enhancing
I have created a PR for enhancing WriteFiles for custom sharding function.
https://github.com/apache/beam/pull/8438
If this sort of change looks good, then next step would be to use in in
Flink runner transform override. Let me know what do you think
On Fri, Apr 26, 2019 at 9:24 AM Jozef Vilcek
Hi Users and Devs,
The CfP deadline approaches. Do submit your technical and/or use case
talks, etc etc. Feel free to reach out if you have any questions.
Cheers,
Austin
On Tue, Apr 23, 2019 at 2:49 AM Maximilian Michels wrote:
> Hi Austin,
>
> Thanks for the heads-up! I just want to
All right, I can test it out if I can. How to deploy pipeline on Flink
portable runner? Should I follow this to be able to do it?
https://beam.apache.org/documentation/runners/flink/
On Tue, Apr 30, 2019 at 4:05 PM Reuven Lax wrote:
> In that case, Robert's point is quite valid. The old Flink
In that case, Robert's point is quite valid. The old Flink runner I believe
had no knowledge of fusion, which was known to make it extremely slow. A
lot of work went into making the portable runner fusion aware, so we don't
need to round trip through coders on every ParDo.
Reuven
On Tue, Apr 30,
It was not a portable Flink runner.
Thanks all for the thoughts, I will create JIRAs, as suggested, with my
findings and send them out
On Tue, Apr 30, 2019 at 11:34 AM Reuven Lax wrote:
> Jozef did you use the portable Flink runner or the old one?
>
> Reuven
>
> On Tue, Apr 30, 2019 at 1:03 AM
If we can, by the apache guidelines, post RCs to pypy that is
definitely the way to go. (Note that test.pypi is for developing
against the pypi interface, not for pushing anything real.) The caveat
about naming these with rcN in the version number still applies
(that's how pypi guards them against
Jozef did you use the portable Flink runner or the old one?
Reuven
On Tue, Apr 30, 2019 at 1:03 AM Robert Bradshaw wrote:
> Thanks for starting this investigation. As mentioned, most of the work
> to date has been on feature parity, not performance parity, but we're
> at the point that the
Thanks for starting this investigation. As mentioned, most of the work
to date has been on feature parity, not performance parity, but we're
at the point that the latter should be tackled as well. Even if there
is a slight overhead (and there's talk about integrating more deeply
with the Flume DAG
33 matches
Mail list logo