the Generative AI world at the
>> "Generative AI Meetup" Wednesday afternoon - if the doc Ben linked to (or
>> GenAI) is interesting to you and you'll be at the conference I'd love to
>> touch base in person!
>>
>> -Ryan
>>
>> On
Hello Beam!
Kaskada has created a query language for expressing temporal queries,
making it easy to work with multiple streams and perform temporally
correct joins. We’re looking at taking our native, columnar execution
engine and making it available as a PTransform and FnHarness for use
with Apac
ity story is part of this too.
>>>
>>> For beam 3.x we could also reason about if there's any complexity that
>>> doesn't hold its weight (e.g. side inputs on CombineFns).
>>>
>>> On Mon, Jan 22, 2018 at 9:20 PM, Jean-Baptiste Onofré
>>>
Thanks Davor for starting the state of the project discussions [1].
In this fork of the state of the project discussion, I’d like to start the
discussion of the feature roadmap for 2018 (and beyond).
To kick off the discussion, I think the features could be divided into
several areas, as follows:
The design doc is here s.apache.org/a-new-dofn
Basically, it was changed to enable better flexibility. Using a method in a
type required all of the accessors to be in the ProcessContext interface --
for instance, accessing the window meant there was a window() method that
gave back a BoundedWindow
en its last pane has fired. I could see this be a property on the View
> transform itself. In terms of implementation - I tried to figure out how
> side input readiness is determined, in the direct runner and Dataflow
> runner, and I'm completely lost and would appreciate some help.
&g
I don't think this is a Dataflow specific question -- other runners likely
perform fusion as it is an important optimization to reduce communication
overhead within a pipeline.
For the same reason, I also don't think making this a global option is
desirable -- in Spark this would be analogous to m
This would be absolutely great! It seems somewhat similar to the changes
that were made to the BigQuery sink to support WriteResult (
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.java
).
I find it helpfu
+1 to what Ismael said.
If Beam is just a "translation" layer it is less useful than if Beam
enables actual portability between runners. Being actually portable
requires more than just translating a pipeline for execution on the runner
-- it means making it possible to pull down metrics and intera
.of(
> new DoFn,KV,V3>> >() {
>
> @ProcessElement
> public void processElement(ProcessContext c)
>
>
> On Fri, Nov 3, 2017 at 9:47 PM, Ben Chambers wrote:
>
>> It looks like this is a problematic interaction between
It looks like this is a problematic interaction between the 2-layer join
and the (unfortunately complicated) continuation triggering. Specifically,
the triggering of Join(A, B) has the continuation trigger, so it isn't
possible to join that with C.
Instead of trying to do Join(Join(A, B), C), cons
These errors are often seen at the end of a pipeline -- they indicate that
due to the failure the backend has been torn down and the attempts to
report the current status have failed. If you look in the "Stack Traces"
tab in the UI [1] or earlier in the Stackdriver logs, you should
(hopefully) be a
Your understanding seems roughly correct. When the watermark is talked
about as a timestamp or "one dimensional" concept it is because we're
implicitly talking about the watermark *at the current processing time*. As
the current processing time moves forward, the value of the watermark
changes too.
+1 -- sounds like a useful addition!
On Wed, May 10, 2017 at 8:27 AM Dan Halperin wrote:
> Hey Michael,
>
> That TestRule sounds like it might be pretty useful generally; would it be
> worth contributing back to Beam?
>
> On Wed, May 10, 2017 at 4:12 AM, Michael Luckey
> wrote:
>
>> Hi Pablo,
>
There isn't currently a great of doing this, since in general, it would
require single-threaded processing. Further, PCollections don't really have
a concept of order.
Could you explain more about your use case? Why do you need to zip elements
with their index?
On Mon, Apr 10, 2017 at 1:28 PM An
on that window (which would mean it is
> still isolated from other event elements).
>
> thx for ideas,
> a.
>
> On Sunday, 2 April 2017, 23:43, Ben Chambers wrote:
>
>
> Can you elaborate on your use case? If your goal is to just group things,
> you can assign a key to
Can you elaborate on your use case? If your goal is to just group things,
you can assign a key to each element and then apply a group by key. You
shouldn't need to use windowing for that.
On Sun, Apr 2, 2017, 2:34 PM Csaba Kassai
wrote:
> Hi Antony,
>
> there is a small custom windowing example
The continuation trigger is automatically used after the first group by key
with a trigger. It is an attempt to trigger "as fast as reasonable" based
on the original trigger. For example, if the trigger was 5 minutes after
the hour (so aligned to an hour and then delayed by 5m) it wouldn't be good
Could you instead key by day and then use per key computations and/or
groups by key? Often it is easier to compute per key than to partition. The
first is a simpler pipeline structure while partition requires more
typically duplicated transform nodes.
On Sun, Feb 19, 2017, 1:20 PM Tobias Feldhaus
On Tue, Dec 27, 2016 at 10:56 PM 陈竞 wrote:
> is there any process or plan about metrics? i saw metrics package in
> beam's sdk, however the api is experimental, which means it may be removed
> in a chance, Besides, beam's metrics only support counter and distribution,
> which is not enough for s
20 matches
Mail list logo