Re: [Proposal] Kaskada DSL and FnHarness for Temporal Queries

2023-06-12 Thread Ben Chambers
the Generative AI world at the >> "Generative AI Meetup" Wednesday afternoon - if the doc Ben linked to (or >> GenAI) is interesting to you and you'll be at the conference I'd love to >> touch base in person! >> >> -Ryan >> >> On

[Proposal] Kaskada DSL and FnHarness for Temporal Queries

2023-06-12 Thread Ben Chambers
Hello Beam! Kaskada has created a query language for expressing temporal queries, making it easy to work with multiple streams and perform temporally correct joins. We’re looking at taking our native, columnar execution engine and making it available as a PTransform and FnHarness for use with Apac

Re: [DISCUSS] State of the project: Feature roadmap for 2018

2018-01-30 Thread Ben Chambers
ity story is part of this too. >>> >>> For beam 3.x we could also reason about if there's any complexity that >>> doesn't hold its weight (e.g. side inputs on CombineFns). >>> >>> On Mon, Jan 22, 2018 at 9:20 PM, Jean-Baptiste Onofré >>>

[DISCUSS] State of the project: Feature roadmap for 2018

2018-01-22 Thread Ben Chambers
Thanks Davor for starting the state of the project discussions [1]. In this fork of the state of the project discussion, I’d like to start the discussion of the feature roadmap for 2018 (and beyond). To kick off the discussion, I think the features could be divided into several areas, as follows:

Re: Design question: Why is ProcessElement an annotation?

2017-12-20 Thread Ben Chambers
The design doc is here s.apache.org/a-new-dofn Basically, it was changed to enable better flexibility. Using a method in a type required all of the accessors to be in the ProcessContext interface -- for instance, accessing the window meant there was a window() method that gave back a BoundedWindow

Re: Callbacks/other functions run after a PDone/output transform

2017-12-15 Thread Ben Chambers
en its last pane has fired. I could see this be a property on the View > transform itself. In terms of implementation - I tried to figure out how > side input readiness is determined, in the direct runner and Dataflow > runner, and I'm completely lost and would appreciate some help. &g

Re: Dataflow fusion

2017-12-07 Thread Ben Chambers
I don't think this is a Dataflow specific question -- other runners likely perform fusion as it is an important optimization to reduce communication overhead within a pipeline. For the same reason, I also don't think making this a global option is desirable -- in Spark this would be analogous to m

Re: Callbacks/other functions run after a PDone/output transform

2017-12-04 Thread Ben Chambers
This would be absolutely great! It seems somewhat similar to the changes that were made to the BigQuery sink to support WriteResult ( https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.java ). I find it helpfu

Re: BEAM counters for validation

2017-11-27 Thread Ben Chambers
+1 to what Ismael said. If Beam is just a "translation" layer it is less useful than if Beam enables actual portability between runners. Being actually portable requires more than just translating a pipeline for execution on the runner -- it means making it possible to pull down metrics and intera

Re: IllegalStateException when combining 3 streams?

2017-11-04 Thread Ben Chambers
.of( > new DoFn,KV,V3>> >() { > > @ProcessElement > public void processElement(ProcessContext c) > > > On Fri, Nov 3, 2017 at 9:47 PM, Ben Chambers wrote: > >> It looks like this is a problematic interaction between

Re: IllegalStateException when combining 3 streams?

2017-11-03 Thread Ben Chambers
It looks like this is a problematic interaction between the 2-layer join and the (unfortunately complicated) continuation triggering. Specifically, the triggering of Join(A, B) has the continuation trigger, so it isn't possible to join that with C. Instead of trying to do Join(Join(A, B), C), cons

Re: Strange errors running on DataFlow

2017-08-03 Thread Ben Chambers
These errors are often seen at the end of a pipeline -- they indicate that due to the failure the backend has been torn down and the attempts to report the current status have failed. If you look in the "Stack Traces" tab in the UI [1] or earlier in the Stackdriver logs, you should (hopefully) be a

Re: Question on watermark

2017-06-15 Thread Ben Chambers
Your understanding seems roughly correct. When the watermark is talked about as a timestamp or "one dimensional" concept it is because we're implicitly talking about the watermark *at the current processing time*. As the current processing time moves forward, the value of the watermark changes too.

Re: Testing of Metrics in context of DoFnTester

2017-05-12 Thread Ben Chambers
+1 -- sounds like a useful addition! On Wed, May 10, 2017 at 8:27 AM Dan Halperin wrote: > Hey Michael, > > That TestRule sounds like it might be pretty useful generally; would it be > worth contributing back to Beam? > > On Wed, May 10, 2017 at 4:12 AM, Michael Luckey > wrote: > >> Hi Pablo, >

Re: howto zip pcollection with index

2017-04-10 Thread Ben Chambers
There isn't currently a great of doing this, since in general, it would require single-threaded processing. Further, PCollections don't really have a concept of order. Could you explain more about your use case? Why do you need to zip elements with their index? On Mon, Apr 10, 2017 at 1:28 PM An

Re: exclusive window per event

2017-04-03 Thread Ben Chambers
on that window (which would mean it is > still isolated from other event elements). > > thx for ideas, > a. > > On Sunday, 2 April 2017, 23:43, Ben Chambers wrote: > > > Can you elaborate on your use case? If your goal is to just group things, > you can assign a key to

Re: exclusive window per event

2017-04-02 Thread Ben Chambers
Can you elaborate on your use case? If your goal is to just group things, you can assign a key to each element and then apply a group by key. You shouldn't need to use windowing for that. On Sun, Apr 2, 2017, 2:34 PM Csaba Kassai wrote: > Hi Antony, > > there is a small custom windowing example

Re: Setting synchronized processing time triggers

2017-02-19 Thread Ben Chambers
The continuation trigger is automatically used after the first group by key with a trigger. It is an attempt to trigger "as fast as reasonable" based on the original trigger. For example, if the trigger was 5 minutes after the hour (so aligned to an hour and then delayed by 5m) it wouldn't be good

Re: Documentation: Side Input and Outputs

2017-02-19 Thread Ben Chambers
Could you instead key by day and then use per key computations and/or groups by key? Often it is easier to compute per key than to partition. The first is a simpler pipeline structure while partition requires more typically duplicated transform nodes. On Sun, Feb 19, 2017, 1:20 PM Tobias Feldhaus

Re: some questions about metrics in beam

2016-12-29 Thread Ben Chambers
On Tue, Dec 27, 2016 at 10:56 PM 陈竞 wrote: > is there any process or plan about metrics? i saw metrics package in > beam's sdk, however the api is experimental, which means it may be removed > in a chance, Besides, beam's metrics only support counter and distribution, > which is not enough for s